OrthoFinder
OrthoFinder copied to clipboard
Start OF from Gene Trees step with only a subset of orthogroups
Hi,
I want to run OrthoFinder from the Gene Trees step onwards. The reason is that I am only interested in the duplication history of a subset of orthogroups. However the default of OrthoFinder is to calculate gene trees for all orthogroups and this takes a lot of time, especially since I want to use iqtree instead of fasttree and especially for OG00000* orthogroups which are large.
My question is: is there a way to run Orthofinder from the Gene Trees step onwards with a set pre-computed gene trees. And if these pre-computed gene trees are only for a few orthogroups, can the analysis finish with only those? And most importantly, how can I set up OrthoFinder to use only these pre-computed trees, what will be the command and the input folder format?
Best, Paschalis Natsidis UCL
Hi Paschalis
In general you are fine skipping orthogroups, but if you do this it is probably best to supply a species tree with the "-s" option, since OrthoFinder might have been relying on the orthogroups that you skipped to infer the species tree.
I think there are two ways you could approach this. The recommended way is to use the tree/MSA extensibility provided by the config.json file:
-
Run OrthoFinder up to the orthogroups stage (command line switch "-og") and identify which orthogroups you are interested in.
-
Write a wrapper script for the tree inference that takes an input and output filename. If the orthogroup is one you want then run the your chosen tree inference program on it and save the resulting tree to the output filename, otherwise skip it or use a fast tree method on it. If you skip it you don't need to create any output file.
-
Add an entry in the config.json file for a "program_type": "tree" with the command line to call your wrapper script.
-
Do the same for your alignments too if you would like to skip these.
-
Run orthofinder from groups using the options "-fg RESULTS_DIR" and "-M msa -T YOUR_TREE_WRAPPER -A YOUR_MSA_WRAPPER -s SPECIES_TREE"
The alternative is to run OrthoFinder up to the point it writes the sequence files and then you run the alignments and trees yourself and put them where OrthoFinder expects:
-
Run to sequences: "-os"
-
Infer the alignments & trees you want on the files in "WorkingDirectory/Sequences_ids/". Save the trees in WorkingDirectory/Trees_ids/
-
Start OrthoFinder from these trees: "-ft RESULTS_DIR -s SPECIES_TREE"
For both of these, I'd recommend testing the workflow out first on the Example Dataset.
All the best David
Hi David,
I tried running the second alternative, put my trees in WorkingDirectory/Trees_ids and ran orthofinder with -ft and -s
I got the following error:
OrthoFinder version 2.4.0 Copyright (C) 2014 David Emms
2020-12-15 09:39:16 : Starting OrthoFinder
40 thread(s) for highly parallel tasks (BLAST searches etc.)
1 thread(s) for OrthoFinder algorithm
Checking required programs are installed
Test can run "fastme -i /SAN/telfordlab/paratomella_et_al/tools/OrthoFinder/ExampleData/OrthoFinder/Results_Dec15/WorkingDirectory/SimpleTest.phy -o /SAN/telfordlab/paratomella_et_al/tools/OrthoFinder/ExampleData/OrthoFinder/Results_Dec15/WorkingDirectory/SimpleTest.tre" - ok
Running Orthologue Prediction
=============================
Reconciling gene and species trees
2020-12-15 09:39:16 : Starting OF Orthologues
Traceback (most recent call last):
File "orthofinder.py", line 7, in
File "scripts_of/main.py", line 1761, in main
File "scripts_of/main.py", line 1517, in GetOrthologues_FromTrees
File "scripts_of/orthologues.py", line 877, in OrthologuesFromTrees
File "scripts_of/orthologues.py", line 851, in ReconciliationAndOrthologues
File "scripts_of/trees2ologs_of.py", line 815, in DoOrthologuesForOrthoFinder
File "scripts_of/files.py", line 374, in GetOGsTreeFN
TypeError: unsupported operand type(s) for +: 'NoneType' and 'str'
[137708] Failed to execute script orthofinder
Any ideas what might have gone wrong?
Paschalis
From: David Emms [email protected] Sent: Wednesday, December 9, 2020 8:19 PM To: davidemms/OrthoFinder [email protected] Cc: Paschalis Natsidis [email protected]; Author [email protected] Subject: Re: [davidemms/OrthoFinder] Start OF from Gene Trees step with only a subset of orthogroups (#488)
Hi Paschalis
In general you are fine skipping orthogroups, but if you do this it is probably best to supply a species tree with the "-s" option, since OrthoFinder might have been relying on the orthogroups that you skipped to infer the species tree.
I think there are two ways you could approach this. The recommended way is to use the tree/MSA extensibility provided by the config.json file:
-
Run OrthoFinder up to the orthogroups stage (command line switch "-og") and identify which orthogroups you are interested in.
-
Write a wrapper script for the tree inference that takes an input and output filename. If the orthogroup is one you want then run the your chosen tree inference program on it and save the resulting tree to the output filename, otherwise skip it or use a fast tree method on it. If you skip it you don't need to create any output file.
-
Add an entry in the config.json file for a "program_type": "tree" with the command line to call your wrapper script.
-
Do the same for your alignments too if you would like to skip these.
-
Run orthofinder from groups using the options "-fg RESULTS_DIR" and "-M msa -T YOUR_TREE_WRAPPER -A YOUR_MSA_WRAPPER -s SPECIES_TREE"
The alternative is to run OrthoFinder up to the point it writes the sequence files and then you run the alignments and trees yourself and put them where OrthoFinder expects:
-
Run to sequences: "-os"
-
Infer the alignments & trees you want on the files in "WorkingDirectory/Sequences_ids/". Save the trees in WorkingDirectory/Trees_ids/
-
Start OrthoFinder from these trees: "-ft RESULTS_DIR -s SPECIES_TREE"
For both of these, I'd recommend testing the workflow out first on the Example Dataset.
All the best David
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fdavidemms%2FOrthoFinder%2Fissues%2F488%23issuecomment-741958893&data=04%7C01%7C%7C21ec2c36f5aa4d268f5408d89c6f0802%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637431347971062299%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=LZ%2Fu9zsb%2F7jiL2OW1A6Th%2BUCJzKQgDuDJQ797%2FhdA4g%3D&reserved=0, or unsubscribehttps://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAEV4V77LFL375CEBOZWBM4TST65UVANCNFSM4USBDCWQ&data=04%7C01%7C%7C21ec2c36f5aa4d268f5408d89c6f0802%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637431347971082302%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=b7xl%2Byp8yRnpeImj048ySklehMoAGYtFfcKkQWH9Rc4%3D&reserved=0.
I think if you add a line to your Log.txt file like this:
WorkingDirectory_Trees: /home/emms/NOBACKUP/ExampleDataset/OrthoFinder/Results_STANDARD/WorkingDirectory/
pointing to the correct WorkingDirectory that might be enough. Have a look at what the Log.txt file looks like when you run it to completion on the Example Dataset. I've not tried this hack myself, but if you provide that extra info in the Log file that might be enough to get it to run.
All the best David
Hi David and Pashcalis, I know this is and old issue but I don't know if it is solved.
I am trying to do the same thing:
- run OF with all my dataset till "-os"
- Infer the alignments, trees and in parallel my own species tree
- select some of the OGs and re run OF with these OG's gene trees and my species tree like these:
./orthofinder -ft /home/Desktop/Orthofinder/Results_dir/ -s /home/Desktop/Orthofinder/species_tree.txt -x /home/Desktop/Speciesinfofilename -t 8
Also, as you suggested, I change added to the Log.txt file generated in the first OF run the line : WorkingDirectory_Base: /home/Desktop/Orthofinder/Results_dir/WorkingDirectory/ WorkingDirectory_Trees:/home/Desktop/Orthofinder/Results_dir/WorkingDirectory/ Finally, I added -x to obtain the Orthoxml file too,
but I got this error:
OrthoFinder version 2.5.4 Copyright (C) 2014 David Emms
2022-01-27 13:09:10 : Starting OrthoFinder 2.5.4 8 thread(s) for highly parallel tasks (BLAST searches etc.) 1 thread(s) for OrthoFinder algorithm
Checking required programs are installed
Test can run "fastme -i /home/vaninat/Desktop/Orthofinder/Results_Jul26_mod/WorkingDirectory/SimpleTest.phy -o /home/vaninat/Desktop/Orthofinder/Results_Jul26_mod/WorkingDirectory/SimpleTest.tre" - ok
Running Orthologue Prediction
Reconciling gene and species trees
2022-01-27 13:09:20 : Starting OF Orthologues
Traceback (most recent call last):
File "orthofinder.py", line 7, in
Any new suggestion??
Thanks in advance!!!
Vanina