OrthoFinder icon indicating copy to clipboard operation
OrthoFinder copied to clipboard

Orthogroups vs Superfamily Family Tree

Open 000generic opened this issue 10 months ago • 0 comments

Hi!

I've been successfully using OrthoFinder for several years and its been great - thank you!

I recently ran OrthoFinder on 17 high-quality chromosome-level genomes - 14 mollusc species and human, fly, and worm as reference genomes.

To understand things more intuitiveily, I wanted to see how OrthoFinder Orthogroups look on a larger gene family / superfamily tree and chose to focus on the TRP ion channels (TRPA, TRPC, TRPM, TRPML, TRPN, TRPP, TRPS, TRPV, TRPVL). To collect homologs across genomes, I used an inhouse pipeline that takes a reference gene set (all human, fly, and worm TRP sequences in this case), does reciprocal blast, and accepts all hits back to any member of the reference gene set.

Focusing here, I built a gene tree using MAFFT/ClipKit/FastTree of homologs for just the gastropod and cephalopod species plus the three models - and included in their headers the orthogroup they were assigned in the 17 species OrthoFinder clustering I had run.

Attached is a screenshot of the TRPM family branch within the greater TRP superfamily tree.

I'm unsure if I am interpreting things correctly and would love any guidance or feedback you might have.

Based on branching in the tree vs OrthoFinder Orthogroups, my sense is that OrthoFinder has incorrectly placed 7 human TRPM sequences into a single Orthogroup containing TRPM1/2/3/4/5/6/7 and then a second orthogroup contains TRPM8. In contrast, in the tree it looks like there should be two orthogroups for human TRPMs - TRPM1/3/6/7 and TRPM2/4/5/8. I've indicated these two tree-identified orthogroups on the tree by hot pink and deep blue branches. I've indicated the OrthoFinder orthogroups in teal or bright green. I've indicated the human, fly, worm sequences in shades of grey.

Another potential issue with the OrthoFinder orthogroups is that there are a number of orthogroups declared for different molluscan subsets of TRPMs - but really they should all be part of a single TRPM2/4/5/8 orthogroup in deep blue - as all the sequences appear to come from a single sequence in the last common ancestor of humans and molluscs.

A third point of confusion is that OrthoFinder orthogroup members are mixed on some tree branches - some of this could be related to weak branch support / FastTree having issues in building the tree - but not in all cases. Still, its possible this might improve if I do a more rigorous job in tree building using true maximum likelihood methods, like in IQTree. I have jobs for this running now.

I don't know I have these interpretations correct - and I don't expect OrthoFinder or most any tool to be perfect in all cases. Most likely I'm guessing I have a definition wrong - or have conflated fundamental aspects - or am simply confused and in over my head. Probably a mix of all three. So I have no idea if I'm getting this right - but I keep coming back to the pink-blue interpretation and thought it is a good time to seek outside expert advice on the OrthoFinder side of things.

Thank you very much - and thanks also for such a powerful easy to use tool!

Eric

ps I've included a version of the tree with bootstrap support - critical nodes are mostly close to 1 I would say, so seems like a reasonable tree allowing that it is only FastTree.

Screenshot 2024-04-16 at 7 21 08 AM Screenshot 2024-04-16 at 7 46 51 AM

000generic avatar Apr 16 '24 12:04 000generic