OrthoFinder
OrthoFinder copied to clipboard
inconsistent newick format
I ran orthofinder with -msa in hopes that it would add support values to gene trees. And it kind of has. In that some trees have them and some don't. As an example, here are two trees from the same run. The first tree has things that look like support values. The second tree doesn't. This seems like a bug. Or perhaps the support value is being implied somehow or isnt well defined for some trees or something? or maybe I'm wrong completely.
(
Zmays_B73_HPIv02_proteins_Zm00001d045341_T003.RefGen_V4:0.0047036,
(
Zmays_B73_HPIv02_proteins_Zm00001d045344_T001.RefGen_V4:5e-09,
(
Zmays_B73_HPIv02_proteins_Zm00001d045350_T001.RefGen_V4:0.164755,
Zmays_B73_HPIv02_proteins_Zm00001d045347_T001.RefGen_V4:5e-09
)0.856:0.0046851
)1:0.0047036
);
(
Zmays_B73_HPIv02_proteins_Zm00001d035269_T001.RefGen_V4:0.264818,
(
(
Zmays_B73_HPIv02_proteins_Zm00001d036033_T001.RefGen_V4:0,
Zmays_B73_HPIv02_proteins_Zm00001d009253_T001.RefGen_V4:0
):6e-09,
Zmays_B73_HPIv02_proteins_Zm00001d019322_T001.RefGen_V4:0.00456014
):0.264818
);
In any case, these two trees don't have the same format and that seems like a bug in and of itself.
Hi
I'm not sure what could be causing this, but I notice that there are some extremely short branch lengths e.g. 5e-09 so it would probably best to rule out that the lack of support values hasn't come straight from the tree inference software. Could you run fasttree on the MSA from this orthogroup yourself and check if you get the same thing?
All the best David
I ran fasttree manually and got basically the same results.
OG0033072.nwk
(
Zmays_B73_HPIv02_proteins_Zm00001d035269_T001.RefGen_V4:0.529636654,
(
Zmays_B73_HPIv02_proteins_Zm00001d036033_T001.RefGen_V4:0.0,
Zmays_B73_HPIv02_proteins_Zm00001d009253_T001.RefGen_V4:0.0
):0.000000006,
Zmays_B73_HPIv02_proteins_Zm00001d019322_T001.RefGen_V4:0.004560143
);
OG0033073.nwk
(
Zmays_B73_HPIv02_proteins_Zm00001d045341_T003.RefGen_V4:0.009407204,
Zmays_B73_HPIv02_proteins_Zm00001d045344_T001.RefGen_V4:0.000000005,
(
Zmays_B73_HPIv02_proteins_Zm00001d045350_T001.RefGen_V4:0.164754998,
Zmays_B73_HPIv02_proteins_Zm00001d045347_T001.RefGen_V4:0.000000005
)0.856:0.004685101
);
It may also be worth noting that I'm currently showing some relatively uninteresting orthogroups. 'Normal' orthogroups are also affected. I've parsed all the trees using the ete toolkit and it looks like about half of my 33000 gene trees lack any support values. (note that ete seems to default to a support value of "1.0" so I can't actually distinguish between an empty support value and a support value of "1.0")
I suppose at this point I could close the issue and go try to contact the fasttree devs. I don't really like my odds there since they just have an email contact. I guess I'll try some of the other tree inference programs like raxml.
Ok.... checked some more. I'm now seeing some trees that, when processed seperately by fasttree, include support values, but don't in the orthofinder results. Examples below...
Orthofinder Result
(
Atrichopoda_UCSC_HPIv02_proteins_evm_27.model.AmTr_v1.0_scaffold00104.35.v1.0:0.142939,
(
(
Gmax_Williams82_HPIv02_proteins_Glyma.02G078600.1.Wm82.a4.v1:0.0723334,
(
(
Tpratense_Milvus_B_HPIv02_proteins_Tp57577_TGAC_v2_mRNA32012.v2:5e-09,
Tpratense_Milvus_B_HPIv02_proteins_Tp57577_TGAC_v2_mRNA961.v2:0.00729856
):0.0887934,
Ljaponica_Gifu_HPIv02_proteins_LotjaGi2g1v0116500.1:0.0455158
):0.0064555
):0.0484212,
(
Tpratense_Milvus_B_HPIv02_proteins_Tp57577_TGAC_v2_mRNA18061.v2:0.0834458,
(
(
(
Athaliana_Col-0_HPIv02_proteins_AT4G09750.1.Araport11.447:0.0233082,
(
Tarvense_Ta1014_HPIv02_proteins_Ta1014.a04.1.g27270_p1:0.0327428,
(
Bnapus_Westar_HPIv02_proteins_BnaA09T0247700WE:0.00545635,
(
Bnapus_Westar_HPIv02_proteins_BnaC09T0285600WE:0.00826621,
Bnapus_Westar_HPIv02_proteins_BnaC02T0284500WE:0.0166876
):0.00272054
):0.015193
):0.0240956
):0.103571,
(
(
Ghirsutum_TM-1_HPIv02_proteins_Gohir.D06G115950.1.v2.1:0.0242748,
Ghirsutum_TM-1_HPIv02_proteins_Gohir.D06G115900.1.v2.1:5e-09
):0.0131892,
Ghirsutum_TM-1_HPIv02_proteins_Gohir.A06G111600.1.v2.1:0.0113575
):0.124077
):0.0182152,
(
Qsuber_1900_HPIv02_proteins_FUN_077261-T1:0.224193,
(
(
Qsuber_1900_HPIv02_proteins_FUN_077271-T1:0.145825,
Qsuber_1900_HPIv02_proteins_FUN_077262-T1:0.367961
):0.0801455,
(
(
Tdomingensis_TD08_HPIv02_proteins_TD08a01.chr_2.g06920_p1:0.00445971,
Tlatafolia_TL01_HPIv02_proteins_TL01g_010202-T1:0.00701654
):0.153928,
(
(
Osativa_Kitaake_HPIv02_proteins_OsKitaake06g201000.1.v3.1:0,
Osativa_NipponBare_HPIv02_proteins_LOC_Os06g39040.1.MSUv7.0:0
):0.0215181,
(
(
Sitalica_Yugu1_HPIv02_proteins_Seita.4G229200.1.v2.2:0.0384756,
(
Zmays_B73_HPIv02_proteins_Zm00001d046483_T001.RefGen_V4:0.0512847,
(
Sbicolor_BTx623_HPIv02_proteins_Sobic.010G177400.1.v3.2:0.00273712,
Sbicolor_RTx430_HPIv02_proteins_SbiRTX430.10G189100.1.v2.1:0.00273339
):0.00568863
):0.0225575
):0.0285611,
(
Bdistachyon_Bd21_HPIv02_proteins_Bradi1g37175.1.v2.1:0.055349,
(
Taestivum_Chinese_Spring_HPIv02_proteins_TraesCS7A03G0898000.1:0.00874206,
(
Taestivum_Chinese_Spring_HPIv02_proteins_TraesCS7B03G0715800.1:0.00853183,
Taestivum_Chinese_Spring_HPIv02_proteins_TraesCS7D03G0839100.1:0.0138021
):5e-09
):0.039127
):0.0283442
):0.0276071
):0.108496
):0.0953897
):0.0114082
):0.0298518
):0.0351004
):0.0108928
):0.142939);
...My fasttree result using orthofinders MSA....
(
Taestivum_Chinese_Spring_HPIv02_proteins_TraesCS7B03G0715800.1:0.008531829,
Taestivum_Chinese_Spring_HPIv02_proteins_TraesCS7D03G0839100.1:0.013802082,
(
Taestivum_Chinese_Spring_HPIv02_proteins_TraesCS7A03G0898000.1:0.008742063,
(
Bdistachyon_Bd21_HPIv02_proteins_Bradi1g37175.1.v2.1:0.055348992,
(
(
Sitalica_Yugu1_HPIv02_proteins_Seita.4G229200.1.v2.2:0.038475556,
(
Zmays_B73_HPIv02_proteins_Zm00001d046483_T001.RefGen_V4:0.051284708,
(
Sbicolor_BTx623_HPIv02_proteins_Sobic.010G177400.1.v3.2:0.002737121,
Sbicolor_RTx430_HPIv02_proteins_SbiRTX430.10G189100.1.v2.1:0.002733391
)0.807:0.005688632
)0.960:0.022557544
)0.968:0.028561130,
(
(
(
Tdomingensis_TD08_HPIv02_proteins_TD08a01.chr_2.g06920_p1:0.004459714,
Tlatafolia_TL01_HPIv02_proteins_TL01g_010202-T1:0.007016542
)1.000:0.153928453,
(
(
(
(
(
Athaliana_Col-0_HPIv02_proteins_AT4G09750.1.Araport11.447:0.023308200,
(
Tarvense_Ta1014_HPIv02_proteins_Ta1014.a04.1.g27270_p1:0.032742765,
(
Bnapus_Westar_HPIv02_proteins_BnaA09T0247700WE:0.005456353,
(
Bnapus_Westar_HPIv02_proteins_BnaC09T0285600WE:0.008266214,
Bnapus_Westar_HPIv02_proteins_BnaC02T0284500WE:0.016687618
)0.747:0.002720543
)0.934:0.015193021
)0.910:0.024095592
)1.000:0.103571475,
(
(
Ghirsutum_TM-1_HPIv02_proteins_Gohir.D06G115950.1.v2.1:0.024274820,
Ghirsutum_TM-1_HPIv02_proteins_Gohir.D06G115900.1.v2.1:0.000000005
)0.950:0.013189174,
Ghirsutum_TM-1_HPIv02_proteins_Gohir.A06G111600.1.v2.1:0.011357497
)1.000:0.124077208
)0.024:0.018215162,
(
Tpratense_Milvus_B_HPIv02_proteins_Tp57577_TGAC_v2_mRNA18061.v2:0.083445772,
(
(
Gmax_Williams82_HPIv02_proteins_Glyma.02G078600.1.Wm82.a4.v1:0.072333361,
(
(
Tpratense_Milvus_B_HPIv02_proteins_Tp57577_TGAC_v2_mRNA32012.v2:0.000000005,
Tpratense_Milvus_B_HPIv02_proteins_Tp57577_TGAC_v2_mRNA961.v2:0.007298560
)0.999:0.088793431,
Ljaponica_Gifu_HPIv02_proteins_LotjaGi2g1v0116500.1:0.045515757
)0.684:0.006455498
)0.971:0.048421181,
Atrichopoda_UCSC_HPIv02_proteins_evm_27.model.AmTr_v1.0_scaffold00104.35.v1.0:0.285878213
)0.379:0.010892776
)0.674:0.035100390
)0.859:0.029851769,
Qsuber_1900_HPIv02_proteins_FUN_077261-T1:0.224192734
)0.721:0.011408226,
(
Qsuber_1900_HPIv02_proteins_FUN_077271-T1:0.145825281,
Qsuber_1900_HPIv02_proteins_FUN_077262-T1:0.367960585
)0.936:0.080145548
)0.990:0.095389725
)1.000:0.108495686,
(
Osativa_Kitaake_HPIv02_proteins_OsKitaake06g201000.1.v3.1:0.0,
Osativa_NipponBare_HPIv02_proteins_LOC_Os06g39040.1.MSUv7.0:0.0
):0.021518120
)0.954:0.027607062
)0.951:0.028344152
)0.987:0.039127018
)0.000:0.000000005);
Any ideas?
I'll try and look into this. Could I check what version of OrthoFinder you are using?
I was running on 2.5.4. I could run again on 2.5.5 if that helps. I can also give the protein sequences for this more 'normal' orthogroup.
Don't worry about 2.5.5, I don't think there are any differences there. The protein sequences might be useful though.
Actually, could you send me the corresponding tree from "WorkingDirectory/Trees_ids/" and also the alignment from "WorkingDirectory/Alignments_ids/" instead? Thanks
I can check the internal "Trees_ids/" file but it looks like it might be because the clade (Osativa_Kitaake_HPIv02_proteins_OsKitaake06g201000.1.v3.1:0.0, Osativa_NipponBare_HPIv02_proteins_LOC_Os06g39040.1.MSUv7.0:0.0) doesn't have a support value in the original tree. That means when I try and read it as a tree with support values using the ete3 library it fails, and so no support values get written out by OrthoFinder.
If this is the case, I can investigate if there's something that can be done. It would be good to check with the "Trees_ids/" file to confirm.
Hopefully these file names are clear.
WorkingDirectory_OG0009549_tree_id.txt
MultipleSequenceAlignments_OG0009549.fa.txt
WorkingDirectory_OG0009549.fa.txt
(Osativa_Kitaake_HPIv02_proteins_OsKitaake06g201000.1.v3.1:0.0, Osativa_NipponBare_HPIv02_proteins_LOC_Os06g39040.1.MSUv7.0:0.0) doesn't have a support value in the original tree.
This is expected behavior for fasttree in cases where the proteins are identical.