bioperl-live
bioperl-live copied to clipboard
Bio::TreeIO produces illegally formatted phyloXML
The phyloXML produced by Bio::TreeIO does not conform to the standard. Basically, when executing a script like the one below, the "phyloXML" formatted output has elements in the wrong order: according to the standard (see: http://www.phyloxml.org/documentation/version_1.20/phyloxml.html ) "name" comes first in a "clade", followed by "branch_length", "confidence". Other "clade" elements come at the very end.
use Bio::TreeIO;
$infile = "t2.txt"; t2.txt
my $treeio = Bio::TreeIO->new(-format => 'newick', -file => $infile);
my $tree = $treeio->next_tree;
for my $node ( $tree->get_nodes ) { printf "id: %s branchlength: %s bootstrap: %s\n", $node->id || '', $node->branch_length || '', $node->bootstrap || '', "\n"; }
my $outfile = "outfile.xml"; my $newio = Bio::TreeIO->new (-format => 'phyloxml', -file=>">$outfile"); $newio->write_tree($tree);
@cmzmasek that is very possible, the code for this was written up many moons ago during the GSoC so even if it were compliant then it may not be now.
My recommendation on this is that we pull out the phyloXML code to a new repository where it can be worked on independently of the main bioperl release. We could then set up tests for issues like this. The main question, once that transition is made, is having someone take this on.