OrthoFinder icon indicating copy to clipboard operation
OrthoFinder copied to clipboard

inconsistent newick format

Open nhartwic opened this issue 2 years ago • 9 comments

I ran orthofinder with -msa in hopes that it would add support values to gene trees. And it kind of has. In that some trees have them and some don't. As an example, here are two trees from the same run. The first tree has things that look like support values. The second tree doesn't. This seems like a bug. Or perhaps the support value is being implied somehow or isnt well defined for some trees or something? or maybe I'm wrong completely.

(
    Zmays_B73_HPIv02_proteins_Zm00001d045341_T003.RefGen_V4:0.0047036,
    (
        Zmays_B73_HPIv02_proteins_Zm00001d045344_T001.RefGen_V4:5e-09,
        (
            Zmays_B73_HPIv02_proteins_Zm00001d045350_T001.RefGen_V4:0.164755,
            Zmays_B73_HPIv02_proteins_Zm00001d045347_T001.RefGen_V4:5e-09
        )0.856:0.0046851
    )1:0.0047036
);

(
    Zmays_B73_HPIv02_proteins_Zm00001d035269_T001.RefGen_V4:0.264818,
    (
        (
            Zmays_B73_HPIv02_proteins_Zm00001d036033_T001.RefGen_V4:0,
            Zmays_B73_HPIv02_proteins_Zm00001d009253_T001.RefGen_V4:0
        ):6e-09,
        Zmays_B73_HPIv02_proteins_Zm00001d019322_T001.RefGen_V4:0.00456014
    ):0.264818
);

In any case, these two trees don't have the same format and that seems like a bug in and of itself.

nhartwic avatar Mar 23 '23 07:03 nhartwic

Hi

I'm not sure what could be causing this, but I notice that there are some extremely short branch lengths e.g. 5e-09 so it would probably best to rule out that the lack of support values hasn't come straight from the tree inference software. Could you run fasttree on the MSA from this orthogroup yourself and check if you get the same thing?

All the best David

davidemms avatar May 15 '23 18:05 davidemms

I ran fasttree manually and got basically the same results.

OG0033072.nwk
(
    Zmays_B73_HPIv02_proteins_Zm00001d035269_T001.RefGen_V4:0.529636654,
    (
        Zmays_B73_HPIv02_proteins_Zm00001d036033_T001.RefGen_V4:0.0,
        Zmays_B73_HPIv02_proteins_Zm00001d009253_T001.RefGen_V4:0.0
    ):0.000000006,
    Zmays_B73_HPIv02_proteins_Zm00001d019322_T001.RefGen_V4:0.004560143
);

OG0033073.nwk
(
    Zmays_B73_HPIv02_proteins_Zm00001d045341_T003.RefGen_V4:0.009407204,
    Zmays_B73_HPIv02_proteins_Zm00001d045344_T001.RefGen_V4:0.000000005,
    (
        Zmays_B73_HPIv02_proteins_Zm00001d045350_T001.RefGen_V4:0.164754998,
        Zmays_B73_HPIv02_proteins_Zm00001d045347_T001.RefGen_V4:0.000000005
    )0.856:0.004685101
);

It may also be worth noting that I'm currently showing some relatively uninteresting orthogroups. 'Normal' orthogroups are also affected. I've parsed all the trees using the ete toolkit and it looks like about half of my 33000 gene trees lack any support values. (note that ete seems to default to a support value of "1.0" so I can't actually distinguish between an empty support value and a support value of "1.0")

I suppose at this point I could close the issue and go try to contact the fasttree devs. I don't really like my odds there since they just have an email contact. I guess I'll try some of the other tree inference programs like raxml.

nhartwic avatar May 24 '23 21:05 nhartwic

Ok.... checked some more. I'm now seeing some trees that, when processed seperately by fasttree, include support values, but don't in the orthofinder results. Examples below...

Orthofinder Result

(
    Atrichopoda_UCSC_HPIv02_proteins_evm_27.model.AmTr_v1.0_scaffold00104.35.v1.0:0.142939,
    (
        (
            Gmax_Williams82_HPIv02_proteins_Glyma.02G078600.1.Wm82.a4.v1:0.0723334,
            (
                (
                    Tpratense_Milvus_B_HPIv02_proteins_Tp57577_TGAC_v2_mRNA32012.v2:5e-09,
                    Tpratense_Milvus_B_HPIv02_proteins_Tp57577_TGAC_v2_mRNA961.v2:0.00729856
                ):0.0887934,
                Ljaponica_Gifu_HPIv02_proteins_LotjaGi2g1v0116500.1:0.0455158
            ):0.0064555
        ):0.0484212,
        (
            Tpratense_Milvus_B_HPIv02_proteins_Tp57577_TGAC_v2_mRNA18061.v2:0.0834458,
            (
                (
                    (
                        Athaliana_Col-0_HPIv02_proteins_AT4G09750.1.Araport11.447:0.0233082,
                        (
                            Tarvense_Ta1014_HPIv02_proteins_Ta1014.a04.1.g27270_p1:0.0327428,
                            (
                                Bnapus_Westar_HPIv02_proteins_BnaA09T0247700WE:0.00545635,
                                (
                                    Bnapus_Westar_HPIv02_proteins_BnaC09T0285600WE:0.00826621,
                                    Bnapus_Westar_HPIv02_proteins_BnaC02T0284500WE:0.0166876
                                ):0.00272054
                            ):0.015193
                        ):0.0240956
                    ):0.103571,
                    (
                        (
                            Ghirsutum_TM-1_HPIv02_proteins_Gohir.D06G115950.1.v2.1:0.0242748,
                            Ghirsutum_TM-1_HPIv02_proteins_Gohir.D06G115900.1.v2.1:5e-09
                        ):0.0131892,
                        Ghirsutum_TM-1_HPIv02_proteins_Gohir.A06G111600.1.v2.1:0.0113575
                    ):0.124077
                ):0.0182152,
                (
                    Qsuber_1900_HPIv02_proteins_FUN_077261-T1:0.224193,
                    (
                        (
                            Qsuber_1900_HPIv02_proteins_FUN_077271-T1:0.145825,
                            Qsuber_1900_HPIv02_proteins_FUN_077262-T1:0.367961
                        ):0.0801455,
                        (
                            (
                                Tdomingensis_TD08_HPIv02_proteins_TD08a01.chr_2.g06920_p1:0.00445971,
                                Tlatafolia_TL01_HPIv02_proteins_TL01g_010202-T1:0.00701654
                            ):0.153928,
                            (
                                (
                                    Osativa_Kitaake_HPIv02_proteins_OsKitaake06g201000.1.v3.1:0,
                                    Osativa_NipponBare_HPIv02_proteins_LOC_Os06g39040.1.MSUv7.0:0
                                ):0.0215181,
                                (
                                    (
                                        Sitalica_Yugu1_HPIv02_proteins_Seita.4G229200.1.v2.2:0.0384756,
                                        (
                                            Zmays_B73_HPIv02_proteins_Zm00001d046483_T001.RefGen_V4:0.0512847,
                                            (
                                                Sbicolor_BTx623_HPIv02_proteins_Sobic.010G177400.1.v3.2:0.00273712,
                                                Sbicolor_RTx430_HPIv02_proteins_SbiRTX430.10G189100.1.v2.1:0.00273339
                                            ):0.00568863
                                        ):0.0225575
                                    ):0.0285611,
                                (
                                    Bdistachyon_Bd21_HPIv02_proteins_Bradi1g37175.1.v2.1:0.055349,
                                    (
                                        Taestivum_Chinese_Spring_HPIv02_proteins_TraesCS7A03G0898000.1:0.00874206,
                                        (
                                            Taestivum_Chinese_Spring_HPIv02_proteins_TraesCS7B03G0715800.1:0.00853183,
                                            Taestivum_Chinese_Spring_HPIv02_proteins_TraesCS7D03G0839100.1:0.0138021
                                        ):5e-09
                                    ):0.039127
                                ):0.0283442
                            ):0.0276071
                        ):0.108496
                    ):0.0953897
                ):0.0114082
            ):0.0298518
        ):0.0351004
    ):0.0108928
):0.142939);

...My fasttree result using orthofinders MSA....

(
    Taestivum_Chinese_Spring_HPIv02_proteins_TraesCS7B03G0715800.1:0.008531829,
    Taestivum_Chinese_Spring_HPIv02_proteins_TraesCS7D03G0839100.1:0.013802082,
    (
        Taestivum_Chinese_Spring_HPIv02_proteins_TraesCS7A03G0898000.1:0.008742063,
        (
            Bdistachyon_Bd21_HPIv02_proteins_Bradi1g37175.1.v2.1:0.055348992,
            (
                (
                    Sitalica_Yugu1_HPIv02_proteins_Seita.4G229200.1.v2.2:0.038475556,
                    (
                        Zmays_B73_HPIv02_proteins_Zm00001d046483_T001.RefGen_V4:0.051284708,
                        (
                            Sbicolor_BTx623_HPIv02_proteins_Sobic.010G177400.1.v3.2:0.002737121,
                            Sbicolor_RTx430_HPIv02_proteins_SbiRTX430.10G189100.1.v2.1:0.002733391
                        )0.807:0.005688632
                    )0.960:0.022557544
                )0.968:0.028561130,
                (
                    (
                        (
                            Tdomingensis_TD08_HPIv02_proteins_TD08a01.chr_2.g06920_p1:0.004459714,
                            Tlatafolia_TL01_HPIv02_proteins_TL01g_010202-T1:0.007016542
                        )1.000:0.153928453,
                        (
                            (
                                (
                                    (
                                        (
                                            Athaliana_Col-0_HPIv02_proteins_AT4G09750.1.Araport11.447:0.023308200,
                                            (
                                                Tarvense_Ta1014_HPIv02_proteins_Ta1014.a04.1.g27270_p1:0.032742765,
                                                (
                                                    Bnapus_Westar_HPIv02_proteins_BnaA09T0247700WE:0.005456353,
                                                    (
                                                        Bnapus_Westar_HPIv02_proteins_BnaC09T0285600WE:0.008266214,
                                                        Bnapus_Westar_HPIv02_proteins_BnaC02T0284500WE:0.016687618
                                                    )0.747:0.002720543
                                                )0.934:0.015193021
                                            )0.910:0.024095592
                                        )1.000:0.103571475,
                                        (
                                            (
                                                Ghirsutum_TM-1_HPIv02_proteins_Gohir.D06G115950.1.v2.1:0.024274820,
                                                Ghirsutum_TM-1_HPIv02_proteins_Gohir.D06G115900.1.v2.1:0.000000005
                                            )0.950:0.013189174,
                                            Ghirsutum_TM-1_HPIv02_proteins_Gohir.A06G111600.1.v2.1:0.011357497
                                        )1.000:0.124077208
                                    )0.024:0.018215162,
                                    (
                                        Tpratense_Milvus_B_HPIv02_proteins_Tp57577_TGAC_v2_mRNA18061.v2:0.083445772,
                                        (
                                            (
                                                Gmax_Williams82_HPIv02_proteins_Glyma.02G078600.1.Wm82.a4.v1:0.072333361,
                                                (
                                                    (
                                                        Tpratense_Milvus_B_HPIv02_proteins_Tp57577_TGAC_v2_mRNA32012.v2:0.000000005,
                                                        Tpratense_Milvus_B_HPIv02_proteins_Tp57577_TGAC_v2_mRNA961.v2:0.007298560
                                                    )0.999:0.088793431,
                                                    Ljaponica_Gifu_HPIv02_proteins_LotjaGi2g1v0116500.1:0.045515757
                                                )0.684:0.006455498
                                            )0.971:0.048421181,
                                        Atrichopoda_UCSC_HPIv02_proteins_evm_27.model.AmTr_v1.0_scaffold00104.35.v1.0:0.285878213
                                    )0.379:0.010892776
                                )0.674:0.035100390
                            )0.859:0.029851769,
                            Qsuber_1900_HPIv02_proteins_FUN_077261-T1:0.224192734
                        )0.721:0.011408226,
                        (
                            Qsuber_1900_HPIv02_proteins_FUN_077271-T1:0.145825281,
                            Qsuber_1900_HPIv02_proteins_FUN_077262-T1:0.367960585
                        )0.936:0.080145548
                    )0.990:0.095389725
                )1.000:0.108495686,
                (
                    Osativa_Kitaake_HPIv02_proteins_OsKitaake06g201000.1.v3.1:0.0,
                    Osativa_NipponBare_HPIv02_proteins_LOC_Os06g39040.1.MSUv7.0:0.0
                ):0.021518120
            )0.954:0.027607062
        )0.951:0.028344152
    )0.987:0.039127018
)0.000:0.000000005);

Any ideas?

nhartwic avatar May 24 '23 22:05 nhartwic

I'll try and look into this. Could I check what version of OrthoFinder you are using?

davidemms avatar Jun 13 '23 20:06 davidemms

I was running on 2.5.4. I could run again on 2.5.5 if that helps. I can also give the protein sequences for this more 'normal' orthogroup.

nhartwic avatar Jun 14 '23 16:06 nhartwic

Don't worry about 2.5.5, I don't think there are any differences there. The protein sequences might be useful though.

davidemms avatar Jun 14 '23 17:06 davidemms

Actually, could you send me the corresponding tree from "WorkingDirectory/Trees_ids/" and also the alignment from "WorkingDirectory/Alignments_ids/" instead? Thanks

davidemms avatar Jun 14 '23 17:06 davidemms

I can check the internal "Trees_ids/" file but it looks like it might be because the clade (Osativa_Kitaake_HPIv02_proteins_OsKitaake06g201000.1.v3.1:0.0, Osativa_NipponBare_HPIv02_proteins_LOC_Os06g39040.1.MSUv7.0:0.0) doesn't have a support value in the original tree. That means when I try and read it as a tree with support values using the ete3 library it fails, and so no support values get written out by OrthoFinder.

If this is the case, I can investigate if there's something that can be done. It would be good to check with the "Trees_ids/" file to confirm.

davidemms avatar Jun 14 '23 17:06 davidemms

Hopefully these file names are clear.

Gene_Trees_OG0009549_tree.txt

WorkingDirectory_OG0009549_tree_id.txt

MultipleSequenceAlignments_OG0009549.fa.txt

WorkingDirectory_OG0009549.fa.txt

(Osativa_Kitaake_HPIv02_proteins_OsKitaake06g201000.1.v3.1:0.0, Osativa_NipponBare_HPIv02_proteins_LOC_Os06g39040.1.MSUv7.0:0.0) doesn't have a support value in the original tree.

This is expected behavior for fasttree in cases where the proteins are identical.

nhartwic avatar Jun 14 '23 23:06 nhartwic