Sequence Format of PHYLIP (Phylogeny Inference Package)
All DNA sequence formats in Phylip are supported. Phylip is distributed by Felsenstein. The website is http://evolution.genetics.washington.edu/phylip.html
the simplest version of the input file looks something like this:
The first line of the input file contains the number of species and the number of characters, in free format, separated by blanks (not by commas). The information for each species follows, starting with a ten-character species name (which can include punctuation marks and blanks), and continuing with the characters for that species. In the discrete-character, DNA sequence programs the characters are each a single letter or digit, sometimes separated by blanks. The conventions about continuing the data beyond one line per species are different between the molecular sequence programs and the others. The molecular sequence programs can take the data in "aligned" or "interleaved" format, with some lines giving the first part of each of the sequences, then lines giving the next part of each, and so on. Thus the sequences might look like this:
- 6 13
- Archaeopt CGATGCTTAC CGC
- HesperorniCGTTACTCGT TGT
- BaluchitheTAATGTTAAT TGT
- B. virginiTAATGTTCGT TGT
- BrontosaurCAAAACCCAT CAT
- B.subtilisGGCAGCCAAT CAC
- 6 39
- Archaeopt CGATGCTTAC CGCCGATGCT
- Hesperorni CGTTACTCGT TGTCGTTACT
- Baluchithe TAATGTTAAT TGTTAATGTT
- B. virgini TAATGTTCGT TGTTAATGTT
- Brontosaur CAAAACCCAT CATCAAAACC
- B.subtilis GGCAGCCAAT CACGGCAGCC
Note that in these sequences we have a blank every ten sites to make them easier to read: any such blanks are allowed. The blank line which separates the two groups of lines (the ones containing sites 1-20 and ones containing sites 21-39) may or may not be present, but if it is, it should be a line of zero length and not contain any extra blank characters. It is important that the number of sites in each group be the same for all species (i.e., it will not be possible to run the programs successfully if the first species line contains 20 bases, but the first line for the second species contains 21 bases).
- TACCGCCGAT GCTTACCGC
- CGTTGTCGTT ACTCGTTGT
- AATTGTTAAT GTTAATTGT
- CGTTGTTAAT GTTCGTTGT
- CATCATCAAA ACCCATCAT
- AATCACGGCA GCCAATCAC
Contents | Prev | Next |