ete
ete copied to clipboard
How to loop multiple trees in a single newick file
Apologies if this has been asked and answered: if I have a newick file with multiple trees (one per line), is there a straightforward way to import and loop through them one at a time? In my experimenting with ete3, if I import such a newick as my_tree = Tree("
Many thanks for any help with this (likely basic) question! Chase
ete3 must be parsing only the first tree in your list. To load all of them you should do something like:
from ete3 import Tree
trees = []
for line in open('mytrees.newick'):
t = Tree(line)
trees.append(t)
Thanks very much, @jhcepas ! If I'm understanding correctly, it's actually considering it one big tree (more likely a tree sequence?) because it's reporting the total number of leaves across all trees (20 trees x 6264 = 125280):
>>> from ete3 import Tree
>>> my_tree = Tree("raxml.mlTrees")
>>> my_tree.describe()
Number of leaf nodes: 125280
Total number of nodes: 250501
Rooted: No
Most distant node: D3|IRC200189
Max. distance: 0.260229
Thanks so much for the suggestion; I will not open the file directly with Tree (since I don't understand what it's doing), but instead loop the lines and read each line separately using Tree.
With gratitude, Chase
wow, that's interesting. In priniciple, ETE will consider the end of a newick tree at the ';' symbol. what's the format of your raxml.mlTrees file? one newick per line?
Indeed, that's exactly what I expected too! Yes, the format is one newick tree per line:
(((C1|IRC202059:0.000653,(C1|PAP2664:0.0 ... 0.000612):0.000980):0.002181):0.001470);
((A1|PAP230848:0.001037,A1|PAP154594:0.0 ... 1|PAP2230:0.000001):0.000001):0.000001);
((A1|PAP102078:0.001252,((((A1|PAP1113:0 ... 002680):0.000001,A1|PAP231566:0.000001);
((A3|PAP194931:0.000001,A3|IRC203769:0.0 ... PAP244882:0.000613):0.000001):0.000001);
((A1|PAP256745:0.000599,(A1|IRC201679:0. ... 0.003449,A1|SCD1362:0.000614):0.001205);
(((A1|SCD2643:0.000001,A1|IRC201643:0.00 ... 0.010375):0.003761):0.000600):0.000001);
(((((A1|IRC201629:0.004009,(((((A1|IRC20 ... 000973):0.001007,A1|PAP156465:0.000001);
((A4|PAP176102:0.001198,(((A4|IRC201637: ... 261|212CG:0.001945):0.000963):0.000001);
((((D4|IRC201739:0.002448,(((C2|IRC20054 ... 010354):0.007279,D4|IRC200643:0.000001);
((((A1|SCD2048:0.003091,A1|PAP111730:0.0 ... 0.001251):0.001229):0.000001):0.000001);
((D2|PAP0372:0.000001,((((((D2|PAP2336:0 ... 000001,D2|PAP221420:0.000001):0.000001);
(((A1|PAP3220:0.001927,(A1|IRC202133:0.0 ... 0.000001):0.000001):0.000001):0.000001);
((A1|IRC200092:0.002336,A1|SCD5803:0.001 ... PAP157935:0.001802):0.000582):0.000001);
(((A1|PAP242284:0.001742,(((A1|PAP119879 ... PAP167532:0.000001):0.000001):0.000001);
(((A1|IRC201915:0.000569,A1|IRC200620:0. ... IRC200513:0.002852):0.000001):0.000001);
((((((NA|IRC201639:0.008498,A1|IRC202151 ... IRC201078:0.001376):0.002153):0.000001);
(((A1|PAP277589:0.001174,(((((A1|Qv35943 ... 0.000001):0.000001,A1|PAP2436:0.000572);
((((A1|SCD2969:0.000001,((A1|SCD2720:0.0 ... 000602):0.000571,A1|PAP289437:0.000001);
(((((A1|IRC201668:0.003988,A1|SCD2323:0. ... 0.006509,A1|SCD1445:0.000657):0.000611);
((A1|IRC201000:0.000001,(A1|PAP254496:0. ... C200816:0.000001,A1|IRC200997:0.000001);
I am so new to ete3 I do not trust myself to understand how the structures work, but I had expected inputting this to result in some sort of list of trees.