McDonald-Kreitman test
Consider the evolution of a protein coding gene in two closely related species. Suppose a sample was taken from each of the species. When the sequences from these two samples or populations are aligned together, polymorphic (variable) nucleotide sites can be identified. Each polymorphic site can be classified by two criteria. One is whether the polymorphic site is a difference between samples or a difference between seqences within a sample. Another criteria is whether the change is synonymous. A change is synonymous if it leads to a synonymous codon and otherwise non-synonymous. The result is conveniently presented by the following four values:
where a, for example, is the number of polymorphic sites that are both within sample variation and synonymous change. When mutations are selectively neutral, one can expect that the ratio of synonymous and nonsynonymous changes remains constant over time. Therefore, whether a mutation is synonymous should not depend on if it is a within sample polymorphism (occurred recently) or a between sample polymorphism (occurred long time ago). In statistical terms, the two classifications of polymorphic sites are independent under the null hypothesis that mutations are selectively neutral. A simple test of the null hypothesis is a Chi-square test, which is X2 =n(ad-bc)2 /[(a+b)(a+c)(b+d)(c+d)] where n=a+b+c+d is the total number of polymorphic sites. When n is not small, X2 follows approximately Chi-square distribution with one degree of freedom. So if the value of X2 is larger that 3.841, the null hypothesis can be rejected at 5% significance level (McDonald and Kreitman 1991). How NeutralityTest estimates synonymous and nonsynonymous changes for McDonald-Kreitman test?
Within sample Between sample Synonymous a b Non-Syn. c d Species#1
1 4 7 10 13 16 19 22 25 Seq1 AGT TCT ATT CCC AAT ATA AGT TAT TAT Seq2 AGC TCT ATT CCC AGG TTA AGT TAT TAT Seq3 AGA TCT CTG CAG ACT TTG AGA CTG CTG Seq4 AGG TCT CTG CAG ACT ATG AGA CTG CTG Species#2
1 4 7 10 13 16 19 22 25 Seq5 AGG CCT ATT CCC GGA TTT GGA CTG CTG Seq6 AGG CCT ATT CCC GGA TTT GGA CTG CTG Seq7 AGG CCT ATT CAC GGA TTT GGT CTG CTT Seq8 AGG CCT ATT CAC GGA TTT GGT CTG CTT Codon (1,2,3):
species#1: 3 mutations in site#3, and there are four haplotypes, for each pair of haplotypes, we have
species#2: Monomorphic.
- AGT - AGC syn = 1.0 non-syn = 0.0
- AGT - AGA syn = 0.0 non-syn = 1.0
- AGT - AGG syn = 0.0 non-syn = 1.0
- AGC - AGA syn = 0.0 non-syn = 1.0
- AGC - AGG syn = 0.0 non-syn = 1.0
- AGA - AGG syn = 1.0 non-syn = 0.0
- -----------------------------------------------------
- total: (syn) 2 + (non-syn) 4 = 6
- Result:
- the number of synonymous = total_syn (2.0) * mut (3) / total (6) = 1;
- the number of non-synonymous = total_non-syn (4.0) * mut (3) / total (6) = 2;
between species: syn = 0, and non-syn = 0.
Codon (4,5,6):
species#1: Monomorphic.
species#2: Monomorphic.
between species: syn = 0, and non-syn = 1.
Codon (7,8,9):
species#1: There are two mutations, and two haplotypes. For the pair of haplotypes, ATT and CTG, there are two possible pathways,
species#2: Monomorphic.
- CTG - ATG - ATT syn = 0, non-syn = 2
- CTG - CTT - ATT syn = 1, non-syn = 1
- Each pathway will be given the same weight, so we have syn = 0.5 and non-syn = 1.5 for the pair of haplotypes.
- total: (syn) 0.5 + (non-syn) 1.5 = 2
- Result:
- the number of synonymous = total_syn (0.5) * mut (2) / total (2) = 0.5;
- the number of non-synonymous = total_non-syn (1.5) * mut (2) / total (2) = 1.5;
between species: syn = 0, and non-syn = 0.
Codon (10,11,12):
species#1: Similar to the previous codon (7,8,9) example, we have
species#2: There is one mutations, and two haplotypes. For the pair of haplotypes, CCC and CAC, there is only one possible pathways,
- the number of synonymous = 0.5;
- the number of non-synonymous = 1.5;
between species: syn = 0, and non-syn = 0.
- CCC - CAC, syn = 0, non-syn = 1.0
- Therefore, total: (syn) 0 + (non-syn) 1 = 1
- Result:
- the number of synonymous = total_syn (0) * mut (1) / total (1) = 0;
- the number of non-synonymous = total_non-syn (1) * mut (1) / total (1) = 1;
Codon (13,14,15):
species#1: There are 3 mutations, and three haplotypes. For each pair of haplotypes, we have
species#2: Monomorphic.
- ACT - AGG, syn = 0.5 and non-syn = 1.5
- Note: Two possible pathways for those two haplotypes, and each pathway is given the same weight. They are ACT - AGT - AGG, and ACT - ACG - AGG.
- ACT - AAT, syn = 0.0 and non-syn = 1.0
- AGG - AAT, syn = 0.0 and non-syn = 2.0
- Note: Two possible pathways for those two haplotypes, and each pathway is given the same weight. They are AGG - AAG - AAT, and AGG - AGT - AAT.
- So we have
- the number of synonymous = total_syn (0.5) * mut (3) / total (5) = 0.3;
- the number of non-synonymous = total_non-syn (4.5) * mut (3) / total (5) = 2.7;
between species: There are two mutations which are fixed. For each pair of haplotypes between two species, the ratio of synonymous mutation and non-synonymous mutations can be obtained. The procedure is similar to that for within species. Then the number of synonymous and non-synonymous between species is that, the number of fixed mutations times the ratios. Result: syn = 0.542, and non-syn = 1.458.
Codon (16,17,18):
species#1: There are two mutations, and 4 haplotypes. For each pair of haplotypes, we have
species#2: Monomorphic.
- TTG - ATA, syn = 0.5 and non-syn = 1.5
- Note: There are two possible pathways for those two haplotypes, and each pathway is given the same weight. There are TTG - ATG - ATA, and TTG - TTA - ATA.
- TTG - ATG, syn = 0.0 and non-syn = 1.0
- TTG - TTA, syn = 1.0 and non-syn = 0.0
- ATA - ATG, syn = 0.0 and non-syn = 1.0
- ATA - TTA, syn = 0.0 and non-syn = 1.0
- ATG - TTA, syn = 0.5 and non-syn = 1.5
- Note: There are two possible pathways for those two haplotypes, and each pathway is given the same weight. There are ATG - TTG - TTA, and ATG - ATA - TTA.
- So we have
- the number of synonymous = total_syn (2) * mut (2) / total (8) = 0.5;
- the number of non-synonymous = total_non-syn (6) * mut (2) / total (8) = 1.5;
between species: There are one mutation which is fixed. For each pair of haplotypes between two species, the ratio of synonymous mutation and non-synonymous mutations can be obtained. The procedure is similar to that for within species. Then the number of synonymous and non-synonymous between species is that, the number of fixed mutations times the ratios. Result: syn = 0.083, and non-syn = 0.917.
Codon (19,20,21):
species#1: the number of synonymous = 0, the number of non-synonymous = 1
species#2: the number of synonymous = 1, the number of non-synonymous = 0
between species: There are one mutation which is fixed. For each pair of haplotypes between two species, the ratio of synonymous mutation and non-synonymous mutations can be obtained. The procedure is similar to that for within species. Then the number of synonymous and non-synonymous between species is that, the number of fixed mutations times the ratios. Result: syn = 0.167, and non-syn = 0.833.
Codon (22,23,24):
species#1: There are 3 mutations, and two haplotypes. For the pair of haplotypes, we have six possible pathways,
species#2: Monomorphic.
- TAT - TAG - CAG - CTG
- TAT - TAG - TTG - CTG
- TAT - CAT - CAG - CTG
- TAT - CAT - CTT - CTG
- TAT - TTT - TTG - CTG
- TAT - TTT - CTT - CTG
- So we have
- the number of synonymous = 0.667;
- the number of non-synonymous = 2.333;
between species: syn = 0, and non-syn = 0.
Codon (25,26,27):
species#1: the number of synonymous = 0.667, and the number of non-synonymous = 2.333.
species#2: the number of synonymous = 1, and the number of non-synonymous = 0.
between species: syn = 0, and non-syn = 0.
Contents | Prev | Next |