ete
ete copied to clipboard
Unable to parse Open Tree of Life Newick
I am having trouble with a Newick file from a specific (but rather central) resource. Problem is described here. Any chance you can help with that? Thanks!
I added some more info here: https://github.com/OpenTreeOfLife/feedback/issues/545
You can grab the tree from: https://tree.opentreeoflife.org/opentree/default/download_subtree/ottol-id/801601/Vertebrata
I tried a very simple example tree: ((Dendromus_ott254739,Malacothrix_typica_ott600700)'Malacothrix (genus in Opisthokonta) ott600707');
And it looks like ete3 does handle labels like ''Malacothrix (genus in Opisthokonta) ott600707'' fine if you add quoted_node_names=True, but breaks down elsewhere on the whole tree:
tree = Tree('subtree-ottol-801601-Vertebrata.tre', format=1, quoted_node_names=True)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/ejmctavish/projects/otapi/venv-ete/lib/python3.8/site-packages/ete3/coretype/tree.py", line 212, in __init__
read_newick(newick, root_node = self, format=format,
File "/home/ejmctavish/projects/otapi/venv-ete/lib/python3.8/site-packages/ete3/parser/newick.py", line 266, in read_newick
return _read_newick_from_string(nw, root_node, matcher, format, quoted_names)
File "/home/ejmctavish/projects/otapi/venv-ete/lib/python3.8/site-packages/ete3/parser/newick.py", line 348, in _read_newick_from_string
node.name = quoted_map[node.name]
KeyError: 'ete3_quotref_5048ete3_quotref_5049ete3_quotref_5050'
I added some more info here: OpenTreeOfLife/feedback#545
You can grab the tree from: https://tree.opentreeoflife.org/opentree/default/download_subtree/ottol-id/801601/Vertebrata
I tried a very simple example tree: ((Dendromus_ott254739,Malacothrix_typica_ott600700)'Malacothrix (genus in Opisthokonta) ott600707');
And it looks like ete3 does handle labels like ''Malacothrix (genus in Opisthokonta) ott600707'' fine if you add quoted_node_names=True, but breaks down elsewhere on the whole tree:
tree = Tree('subtree-ottol-801601-Vertebrata.tre', format=1, quoted_node_names=True) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/ejmctavish/projects/otapi/venv-ete/lib/python3.8/site-packages/ete3/coretype/tree.py", line 212, in __init__ read_newick(newick, root_node = self, format=format, File "/home/ejmctavish/projects/otapi/venv-ete/lib/python3.8/site-packages/ete3/parser/newick.py", line 266, in read_newick return _read_newick_from_string(nw, root_node, matcher, format, quoted_names) File "/home/ejmctavish/projects/otapi/venv-ete/lib/python3.8/site-packages/ete3/parser/newick.py", line 348, in _read_newick_from_string node.name = quoted_map[node.name] KeyError: 'ete3_quotref_5048ete3_quotref_5049ete3_quotref_5050'
I checked the tree and it seems the node name of sample tree are not consistently quoted, some were quoted and some were not. I believe that's what make the program confused.
Cheers