ete icon indicating copy to clipboard operation
ete copied to clipboard

Unable to parse Open Tree of Life Newick

Open soungalo opened this issue 2 years ago • 2 comments

I am having trouble with a Newick file from a specific (but rather central) resource. Problem is described here. Any chance you can help with that? Thanks!

soungalo avatar Apr 05 '22 14:04 soungalo

I added some more info here: https://github.com/OpenTreeOfLife/feedback/issues/545

You can grab the tree from: https://tree.opentreeoflife.org/opentree/default/download_subtree/ottol-id/801601/Vertebrata

I tried a very simple example tree: ((Dendromus_ott254739,Malacothrix_typica_ott600700)'Malacothrix (genus in Opisthokonta) ott600707');

And it looks like ete3 does handle labels like ''Malacothrix (genus in Opisthokonta) ott600707'' fine if you add quoted_node_names=True, but breaks down elsewhere on the whole tree:

tree = Tree('subtree-ottol-801601-Vertebrata.tre', format=1, quoted_node_names=True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/ejmctavish/projects/otapi/venv-ete/lib/python3.8/site-packages/ete3/coretype/tree.py", line 212, in __init__
    read_newick(newick, root_node = self, format=format,
  File "/home/ejmctavish/projects/otapi/venv-ete/lib/python3.8/site-packages/ete3/parser/newick.py", line 266, in read_newick
    return _read_newick_from_string(nw, root_node, matcher, format, quoted_names)
  File "/home/ejmctavish/projects/otapi/venv-ete/lib/python3.8/site-packages/ete3/parser/newick.py", line 348, in _read_newick_from_string
    node.name = quoted_map[node.name]
KeyError: 'ete3_quotref_5048ete3_quotref_5049ete3_quotref_5050'

snacktavish avatar Apr 05 '22 16:04 snacktavish

I added some more info here: OpenTreeOfLife/feedback#545

You can grab the tree from: https://tree.opentreeoflife.org/opentree/default/download_subtree/ottol-id/801601/Vertebrata

I tried a very simple example tree: ((Dendromus_ott254739,Malacothrix_typica_ott600700)'Malacothrix (genus in Opisthokonta) ott600707');

And it looks like ete3 does handle labels like ''Malacothrix (genus in Opisthokonta) ott600707'' fine if you add quoted_node_names=True, but breaks down elsewhere on the whole tree:

tree = Tree('subtree-ottol-801601-Vertebrata.tre', format=1, quoted_node_names=True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/ejmctavish/projects/otapi/venv-ete/lib/python3.8/site-packages/ete3/coretype/tree.py", line 212, in __init__
    read_newick(newick, root_node = self, format=format,
  File "/home/ejmctavish/projects/otapi/venv-ete/lib/python3.8/site-packages/ete3/parser/newick.py", line 266, in read_newick
    return _read_newick_from_string(nw, root_node, matcher, format, quoted_names)
  File "/home/ejmctavish/projects/otapi/venv-ete/lib/python3.8/site-packages/ete3/parser/newick.py", line 348, in _read_newick_from_string
    node.name = quoted_map[node.name]
KeyError: 'ete3_quotref_5048ete3_quotref_5049ete3_quotref_5050'

I checked the tree and it seems the node name of sample tree are not consistently quoted, some were quoted and some were not. I believe that's what make the program confused.

Cheers

dengzq1234 avatar Apr 21 '22 09:04 dengzq1234