bioperl-live icon indicating copy to clipboard operation
bioperl-live copied to clipboard

Change Nexus syntax in output from Bio::AlignIO::nexus

Open nylander opened this issue 5 years ago • 1 comments

Hi,

the nexus syntax output from Bio::AlignIO::nexus is at odds with current nexus standards. For example, the BioPerl module writes

format interleave datatype=dna   gap=- symbols="CTANG";

but the software paup*, written by one of the inventors of the nexus format, complains with the following message:

Error(#329): User-defined symbol 'A' conflicts with predefined DNA state symbol.

                 If you are using a predefined format ('DNA', 'RNA', 'nucleotide', or 'protein'), you
                 may not specify predefined states for this format as symbols in the Format command.

I suggest we comply - by having the Bio::AlignIO::nexus module not output the string symbols="CTANG" (if datatype=dna is already written). I assume this also applies to the other predefined alphabets ('RNA', 'nucleotide', or 'protein').

Cheers Johan

nylander avatar Oct 31 '18 12:10 nylander

For a formal reference, see: Maddison, Swofford, Maddison. 1997. Nexus: An Extensible File Format for Systematic Information https://doi.org/10.1093/sysbio/46.4.590, in which we can read (p.599):

"For STANDARD DATATYPEs, a SYMBOLS subcommand will replace the default symbols list of "0 1". For DNA, RNA, NUCLEOTIDE, and PROTEIN DATATYPEs, a SYMBOLS subcommand will not replace the default symbols list, but will add character-state symbols to the SYMBOLS list."

Hence, adding, e.g., symbols C,T,A,G when they are already defined doesn't make sense (and causing some software to throw an error).

nylander avatar Nov 16 '18 13:11 nylander