McDonald-Kreitman test


Consider the evolution of a protein coding gene in two closely related species. Suppose a sample was taken from each of the species. When the sequences from these two samples or populations are aligned together, polymorphic (variable) nucleotide sites can be identified. Each polymorphic site can be classified by two criteria. One is whether the polymorphic site is a difference between samples or a difference between seqences within a sample. Another criteria is whether the change is synonymous. A change is synonymous if it leads to a synonymous codon and otherwise non-synonymous. The result is conveniently presented by the following four values:

  Within sample Between sample
Synonymous a b
Non-Syn. c d

where a, for example, is the number of polymorphic sites that are both within sample variation and synonymous change. When mutations are selectively neutral, one can expect that the ratio of synonymous and nonsynonymous changes remains constant over time. Therefore, whether a mutation is synonymous should not depend on if it is a within sample polymorphism (occurred recently) or a between sample polymorphism (occurred long time ago). In statistical terms, the two classifications of polymorphic sites are independent under the null hypothesis that mutations are selectively neutral. A simple test of the null hypothesis is a Chi-square test, which is

X2 =n(ad-bc)2 /[(a+b)(a+c)(b+d)(c+d)]

where n=a+b+c+d is the total number of polymorphic sites. When n is not small, X2 follows approximately Chi-square distribution with one degree of freedom. So if the value of X2 is larger that 3.841, the null hypothesis can be rejected at 5% significance level (McDonald and Kreitman 1991).

How NeutralityTest estimates synonymous and nonsynonymous changes for McDonald-Kreitman test?

Species#1

  1 4 7 10 13 16 19 22 25
Seq1 AGT TCT ATT CCC AAT ATA AGT TAT TAT
Seq2 AGC TCT ATT CCC AGG TTA AGT TAT TAT
Seq3 AGA TCT CTG CAG ACT TTG AGA CTG CTG
Seq4 AGG TCT CTG CAG ACT ATG AGA CTG CTG

Species#2

  1 4 7 10 13 16 19 22 25
Seq5 AGG CCT ATT CCC GGA TTT GGA CTG CTG
Seq6 AGG CCT ATT CCC GGA TTT GGA CTG CTG
Seq7 AGG CCT ATT CAC GGA TTT GGT CTG CTT
Seq8 AGG CCT ATT CAC GGA TTT GGT CTG CTT

Codon (1,2,3):

species#1: 3 mutations in site#3, and there are four haplotypes, for each pair of haplotypes, we have

species#2: Monomorphic.

between species: syn = 0, and non-syn = 0.

Codon (4,5,6):

species#1: Monomorphic.

species#2: Monomorphic.

between species: syn = 0, and non-syn = 1.

Codon (7,8,9):

species#1: There are two mutations, and two haplotypes. For the pair of haplotypes, ATT and CTG, there are two possible pathways,

species#2: Monomorphic.

between species: syn = 0, and non-syn = 0.

Codon (10,11,12):

species#1: Similar to the previous codon (7,8,9) example, we have

species#2: There is one mutations, and two haplotypes. For the pair of haplotypes, CCC and CAC, there is only one possible pathways,

between species: syn = 0, and non-syn = 0.

Codon (13,14,15):

species#1: There are 3 mutations, and three haplotypes. For each pair of haplotypes, we have

species#2: Monomorphic.

between species: There are two mutations which are fixed. For each pair of haplotypes between two species, the ratio of synonymous mutation and non-synonymous mutations can be obtained. The procedure is similar to that for within species. Then the number of synonymous and non-synonymous between species is that, the number of fixed mutations times the ratios. Result: syn = 0.542, and non-syn = 1.458.

Codon (16,17,18):

species#1: There are two mutations, and 4 haplotypes. For each pair of haplotypes, we have

species#2: Monomorphic.

between species: There are one mutation which is fixed. For each pair of haplotypes between two species, the ratio of synonymous mutation and non-synonymous mutations can be obtained. The procedure is similar to that for within species. Then the number of synonymous and non-synonymous between species is that, the number of fixed mutations times the ratios. Result: syn = 0.083, and non-syn = 0.917.

Codon (19,20,21):

species#1: the number of synonymous = 0, the number of non-synonymous = 1

species#2: the number of synonymous = 1, the number of non-synonymous = 0

between species: There are one mutation which is fixed. For each pair of haplotypes between two species, the ratio of synonymous mutation and non-synonymous mutations can be obtained. The procedure is similar to that for within species. Then the number of synonymous and non-synonymous between species is that, the number of fixed mutations times the ratios. Result: syn = 0.167, and non-syn = 0.833.

Codon (22,23,24):

species#1: There are 3 mutations, and two haplotypes. For the pair of haplotypes, we have six possible pathways,

species#2: Monomorphic.

between species: syn = 0, and non-syn = 0.

Codon (25,26,27):

species#1: the number of synonymous = 0.667, and the number of non-synonymous = 2.333.

species#2: the number of synonymous = 1, and the number of non-synonymous = 0.

between species: syn = 0, and non-syn = 0.


     Contents Prev Next