bioperl-live
bioperl-live copied to clipboard
Change Nexus syntax in output from Bio::AlignIO::nexus
Hi,
the nexus syntax output from Bio::AlignIO::nexus is at odds with current nexus standards. For example, the BioPerl module writes
format interleave datatype=dna gap=- symbols="CTANG";
but the software paup*, written by one of the inventors of the nexus format, complains with the following message:
Error(#329): User-defined symbol 'A' conflicts with predefined DNA state symbol.
If you are using a predefined format ('DNA', 'RNA', 'nucleotide', or 'protein'), you
may not specify predefined states for this format as symbols in the Format command.
I suggest we comply - by having the Bio::AlignIO::nexus module not output the string symbols="CTANG"
(if datatype=dna
is already written). I assume this also applies to the other predefined alphabets ('RNA', 'nucleotide', or 'protein').
Cheers Johan
For a formal reference, see: Maddison, Swofford, Maddison. 1997. Nexus: An Extensible File Format for Systematic Information https://doi.org/10.1093/sysbio/46.4.590, in which we can read (p.599):
"For STANDARD DATATYPEs, a SYMBOLS subcommand will replace the default symbols list of "0 1". For DNA, RNA, NUCLEOTIDE, and PROTEIN DATATYPEs, a SYMBOLS subcommand will not replace the default symbols list, but will add character-state symbols to the SYMBOLS list."
Hence, adding, e.g., symbols C,T,A,G when they are already defined doesn't make sense (and causing some software to throw an error).