OrthoFinder
OrthoFinder copied to clipboard
Species tree inference failed
Hi, I used orthofinder to do Gene family analysis, But I met an error ,could you help me fix it ? thank you so much. here is part of my logs.
OrthoFinder version 2.4.1 Copyright (C) 2014 David Emms
2021-01-09 11:16:17 : Starting OrthoFinder 2.4.1 12 thread(s) for highly parallel tasks (BLAST searches etc.) 12 thread(s) for OrthoFinder algorithm
Checking required programs are installed
Test can run "makeblastdb -help" - ok Test can run "blastp -help" - ok Test can run "mcl -h" - ok Test can run "mafft /ds3512/home/panyp/NN1138-2/03.orthofinder_data/02.SoybeanData/OrthoFinder/Results_Jan09/WorkingDirectory//_dependencies_check/SimpleTest.fa" - ok Test can run "iqtree" - ok
WARNING: Files have been ignored as they don't appear to be FASTA files: run_orthofinder_source.sh run_orthofinder_source.sh.o28994 run_orthofinder_source.sh.o29262 test.py OrthoFinder expects FASTA files to have one of the following extensions: fasta, pep, fas, faa, fa
Dividing up work for BLAST for parallel processing
2021-01-09 11:17:05 : Creating Blast database 1 of 36 2021-01-09 11:17:06 : Creating Blast database 2 of 36 2021-01-09 11:17:07 : Creating Blast database 3 of 36 2021-01-09 11:17:10 : Creating Blast database 4 of 36 2021-01-09 11:17:13 : Creating Blast database 5 of 36 2021-01-09 11:17:17 : Creating Blast database 6 of 36 2021-01-09 11:17:19 : Creating Blast database 7 of 36 2021-01-09 11:17:22 : Creating Blast database 8 of 36 2021-01-09 11:17:25 : Creating Blast database 9 of 36 2021-01-09 11:17:30 : Creating Blast database 10 of 36 2021-01-09 11:17:33 : Creating Blast database 11 of 36 2021-01-09 11:17:35 : Creating Blast database 12 of 36 2021-01-09 11:17:40 : Creating Blast database 13 of 36 2021-01-09 11:17:45 : Creating Blast database 14 of 36 2021-01-09 11:17:50 : Creating Blast database 15 of 36 2021-01-09 11:17:55 : Creating Blast database 16 of 36 2021-01-09 11:17:58 : Creating Blast database 17 of 36 2021-01-09 11:18:01 : Creating Blast database 18 of 36 2021-01-09 11:18:06 : Creating Blast database 19 of 36 2021-01-09 11:18:11 : Creating Blast database 20 of 36 2021-01-09 11:18:16 : Creating Blast database 21 of 36 2021-01-09 11:18:21 : Creating Blast database 22 of 36 2021-01-09 11:18:24 : Creating Blast database 23 of 36 2021-01-09 11:18:27 : Creating Blast database 24 of 36 2021-01-09 11:18:32 : Creating Blast database 25 of 36 2021-01-09 11:18:35 : Creating Blast database 26 of 36 2021-01-09 11:18:38 : Creating Blast database 27 of 36 2021-01-09 11:18:43 : Creating Blast database 28 of 36 2021-01-09 11:18:48 : Creating Blast database 29 of 36 2021-01-09 11:18:54 : Creating Blast database 30 of 36 2021-01-09 11:18:59 : Creating Blast database 31 of 36 2021-01-09 11:19:04 : Creating Blast database 32 of 36 2021-01-09 11:19:09 : Creating Blast database 33 of 36 2021-01-09 11:19:14 : Creating Blast database 34 of 36 2021-01-09 11:19:17 : Creating Blast database 35 of 36 2021-01-09 11:19:22 : Creating Blast database 36 of 36
Running BLAST all-versus-all
Using 12 thread(s) 2021-01-09 11:19:25 : This may take some time.... 2021-01-25 14:22:23 : Done 300 of 1296
WARNING: program called by OrthoFinder produced output to stderr
Command: blastp -outfmt 6 -evalue 0.001 -query /ds3512/home/panyp/NN1138-2/03.orthofinder_data/02.SoybeanData/OrthoFinder/Results_Jan09/WorkingDirectory/Species35.fa -db /ds3512/home/panyp/NN1138-2/03.orthofinder_data/02.SoybeanData/OrthoFinder/Results_Jan09/WorkingDirectory/BlastDBSpecies28 -out /ds3512/home/panyp/NN1138-2/03.orthofinder_data/02.SoybeanData/OrthoFinder/Results_Jan09/WorkingDirectory/Blast35_28.txt
stdout
b'' stderr
b'Warning: (1431.1) CFastaReader: Ignoring invalid residue . at line 2, position 111\nWarning: (1431.1) CFastaReader: Ignoring invalid residue . at line 4, position 499\n
WARNING: program called by OrthoFinder produced output to stderr
Command: blastp -outfmt 6 -evalue 0.001 -query /ds3512/home/panyp/NN1138-2/03.orthofinder_data/02.SoybeanData/OrthoFinder/Results_Jan09/WorkingDirectory/Species35.fa -db /ds3512/home/panyp/NN1138-2/03.orthofinder_data/02.SoybeanData/OrthoFinder/Results_Jan09/WorkingDirectory/BlastDBSpecies3 -out /ds3512/home/panyp/NN1138-2/03.orthofinder_data/02.SoybeanData/OrthoFinder/Results_Jan09/WorkingDirectory/Blast35_3.txt ......... ......... ........
2021-03-04 19:25:27 : Done all-versus-all sequence search
Running OrthoFinder algorithm
2021-03-04 19:25:42 : Initial processing of each species 2021-03-04 20:04:11 : Initial processing of species 0 complete 2021-03-04 20:41:03 : Initial processing of species 1 complete 2021-03-04 21:55:16 : Initial processing of species 2 complete 2021-03-04 22:49:56 : Initial processing of species 3 complete 2021-03-04 23:51:48 : Initial processing of species 4 complete 2021-03-05 00:29:04 : Initial processing of species 5 complete 2021-03-05 01:41:34 : Initial processing of species 6 complete 2021-03-05 02:54:12 : Initial processing of species 7 complete 2021-03-05 04:06:58 : Initial processing of species 8 complete 2021-03-05 05:21:07 : Initial processing of species 9 complete 2021-03-05 06:33:48 : Initial processing of species 10 complete 2021-03-05 07:42:26 : Initial processing of species 11 complete 2021-03-05 08:49:46 : Initial processing of species 12 complete 2021-03-05 09:58:26 : Initial processing of species 13 complete 2021-03-05 11:17:30 : Initial processing of species 14 complete 2021-03-05 12:42:02 : Initial processing of species 15 complete 2021-03-05 14:14:39 : Initial processing of species 16 complete 2021-03-05 15:23:14 : Initial processing of species 17 complete 2021-03-05 16:49:39 : Initial processing of species 18 complete 2021-03-05 17:59:31 : Initial processing of species 19 complete 2021-03-05 19:06:48 : Initial processing of species 20 complete 2021-03-05 20:15:43 : Initial processing of species 21 complete 2021-03-05 21:23:02 : Initial processing of species 22 complete 2021-03-05 22:36:21 : Initial processing of species 23 complete 2021-03-05 23:45:53 : Initial processing of species 24 complete 2021-03-06 00:57:09 : Initial processing of species 25 complete 2021-03-06 02:10:45 : Initial processing of species 26 complete 2021-03-06 03:27:59 : Initial processing of species 27 complete 2021-03-06 04:43:05 : Initial processing of species 28 complete 2021-03-06 05:48:40 : Initial processing of species 29 complete 2021-03-06 07:01:40 : Initial processing of species 30 complete 2021-03-06 08:25:25 : Initial processing of species 31 complete 2021-03-06 09:35:57 : Initial processing of species 32 complete 2021-03-06 10:13:33 : Initial processing of species 33 complete 2021-03-06 11:32:13 : Initial processing of species 34 complete 2021-03-06 12:12:19 : Initial processing of species 35 complete 2021-03-06 12:27:43 : Connected putative homologues 2021-03-06 12:30:57 : Written final scores for species 0 to graph file 2021-03-06 12:38:45 : Written final scores for species 12 to graph file 2021-03-06 12:44:24 : Written final scores for species 24 to graph file 2021-03-06 12:31:20 : Written final scores for species 1 to graph file 2021-03-06 12:38:54 : Written final scores for species 13 to graph file 2021-03-06 12:44:39 : Written final scores for species 25 to graph file 2021-03-06 12:31:23 : Written final scores for species 5 to graph file 2021-03-06 12:39:21 : Written final scores for species 14 to graph file 2021-03-06 12:45:38 : Written final scores for species 26 to graph file 2021-03-06 12:35:12 : Written final scores for species 3 to graph file 2021-03-06 12:40:37 : Written final scores for species 15 to graph file 2021-03-06 12:46:18 : Written final scores for species 27 to graph file 2021-03-06 12:35:37 : Written final scores for species 2 to graph file 2021-03-06 12:40:53 : Written final scores for species 21 to graph file 2021-03-06 12:46:35 : Written final scores for species 28 to graph file 2021-03-06 12:35:25 : Written final scores for species 11 to graph file 2021-03-06 12:43:38 : Written final scores for species 19 to graph file 2021-03-06 12:47:06 : Written final scores for species 33 to graph file 2021-03-06 12:35:45 : Written final scores for species 9 to graph file 2021-03-06 12:42:36 : Written final scores for species 22 to graph file 2021-03-06 12:47:19 : Written final scores for species 30 to graph file 2021-03-06 12:35:34 : Written final scores for species 4 to graph file 2021-03-06 12:42:44 : Written final scores for species 20 to graph file 2021-03-06 12:47:39 : Written final scores for species 31 to graph file 2021-03-06 12:35:22 : Written final scores for species 10 to graph file 2021-03-06 12:43:48 : Written final scores for species 18 to graph file 2021-03-06 12:47:40 : Written final scores for species 35 to graph file 2021-03-06 12:35:46 : Written final scores for species 6 to graph file 2021-03-06 12:43:20 : Written final scores for species 23 to graph file 2021-03-06 12:47:53 : Written final scores for species 32 to graph file 2021-03-06 12:35:19 : Written final scores for species 8 to graph file 2021-03-06 12:42:04 : Written final scores for species 17 to graph file 2021-03-06 12:48:48 : Written final scores for species 29 to graph file 2021-03-06 12:35:15 : Written final scores for species 7 to graph file 2021-03-06 12:43:38 : Written final scores for species 16 to graph file 2021-03-06 12:49:59 : Written final scores for species 34 to graph file 2021-03-06 13:19:57 : Ran MCL
Writing orthogroups to file
OrthoFinder assigned 1855627 genes (98.6% of total) to 48444 orthogroups. Fifty percent of all genes were in orthogroups with 66 or more genes (G50 was 66) and were contained in the largest 8957 orthogroups (O50 was 8957). There were 8917 orthogroups with all species present and 256 of these consisted entirely of single-copy genes.
2021-03-06 13:24:11 : Done orthogroups
Analysing Orthogroups
2021-03-06 13:24:19 : Starting MSA/Trees Species tree: Using 1242 orthogroups with minimum of 91.7% of species having single-copy genes in any orthogroup
Inferring multiple sequence alignments for species tree
2021-03-06 16:38:04 : Done 200 of 1242 2021-03-07 06:53:13 : Done 1000 of 1242 2021-03-07 08:53:00 : Done 1200 of 1242 2021-03-06 21:46:25 : Done 400 of 1242 2021-03-07 04:25:22 : Done 800 of 1242 2021-03-07 05:43:31 : Done 900 of 1242 2021-03-07 08:01:06 : Done 1100 of 1242 2021-03-07 03:07:20 : Done 700 of 1242 2021-03-06 19:34:28 : Done 300 of 1242 2021-03-07 01:05:36 : Done 600 of 1242 2021-03-06 14:58:29 : Done 100 of 1242 2021-03-06 23:30:31 : Done 500 of 1242 2021-03-06 13:27:10 : Done 0 of 1242
Inferring remaining multiple sequence alignments and gene trees
2021-05-31 07:43:00 : Done 2000 of 47203 2021-06-18 09:42:04 : Done 16000 of 47203 2021-06-19 14:16:50 : Done 19000 of 47203 2021-06-20 05:47:00 : Done 28000 of 47203 2021-06-20 12:15:58 : Done 43000 of 47203 2021-03-07 09:48:28 : Done 0 of 47203 2021-06-19 00:19:04 : Done 17000 of 47203 2021-06-20 12:11:49 : Done 41000 of 47203 2021-06-20 12:18:01 : Done 44000 of 47203 2021-06-19 09:21:09 : Done 18000 of 47203 2021-06-20 01:52:52 : Done 25000 of 47203 2021-06-20 02:59:30 : Done 26000 of 47203 2021-06-20 09:37:04 : Done 31000 of 47203 2021-06-20 12:08:54 : Done 40000 of 47203 2021-06-06 11:57:51 : Done 4000 of 47203 2021-06-08 16:52:29 : Done 5000 of 47203 2021-06-14 19:45:47 : Done 10000 of 47203 2021-06-20 01:03:48 : Done 24000 of 47203 2021-06-19 21:44:33 : Done 22000 of 47203 2021-06-20 07:15:59 : Done 29000 of 47203 2021-06-20 10:22:10 : Done 32000 of 47203 2021-06-20 11:43:08 : Done 36000 of 47203 2021-06-20 12:20:07 : Done 45000 of 47203 2021-06-19 23:26:05 : Done 23000 of 47203 2021-06-20 11:15:09 : Done 34000 of 47203 2021-06-15 12:29:44 : Done 11000 of 47203 2021-06-17 19:54:34 : Done 15000 of 47203 2021-06-20 11:59:31 : Done 38000 of 47203 2021-06-20 12:04:41 : Done 39000 of 47203 2021-06-10 11:05:42 : Done 6000 of 47203 2021-06-16 06:54:40 : Done 12000 of 47203 2021-06-20 08:40:21 : Done 30000 of 47203 2021-06-20 12:13:56 : Done 42000 of 47203 2021-06-03 17:04:23 : Done 3000 of 47203 2021-06-11 21:15:54 : Done 7000 of 47203 2021-06-13 00:21:56 : Done 8000 of 47203 2021-06-20 10:52:05 : Done 33000 of 47203 2021-06-20 11:52:40 : Done 37000 of 47203 2021-06-20 12:22:11 : Done 46000 of 47203 2021-05-26 23:52:54 : Done 1000 of 47203 2021-06-17 07:24:26 : Done 14000 of 47203 2021-06-20 04:15:44 : Done 27000 of 47203 2021-06-20 11:31:28 : Done 35000 of 47203 2021-06-20 12:24:10 : Done 47000 of 47203 2021-06-14 00:21:29 : Done 9000 of 47203 2021-06-16 19:35:51 : Done 13000 of 47203 2021-06-19 17:35:54 : Done 20000 of 47203 2021-06-19 19:50:55 : Done 21000 of 47203 ERROR: Species tree inference failed ERROR: An error occurred, please review the error messages they may contain useful information about the problem.
Hi
It looks like you're using ISTREE for tree inference, you can have a look at the log file it produced to see what the problem was, it should be called "WorkingDirectory/Alignments_ids/SpeciesTree.log"
Best wishes David
David: thank you for your advice. I checked SpeciesTree.log and found there was an error indeed. here is the error:
IQ-TREE multicore version 1.6.12 for Linux 64-bit built Aug 15 2019 Developed by Bui Quang Minh, Nguyen Lam Tung, Olga Chernomor, Heiko Schmidt, Dominik Schrempf, Michael Woodhams.
Host: compute-0-1.local (AVX, 220 GB RAM) Command: /ds3512/home/panyp/ruanjian/iqtree-1.6.12-Linux/bin/iqtree -s /ds3512/home/panyp/NN1138-2/03.orthofinder_data/02.SoybeanData/OrthoFinder/Results_Jan 09/WorkingDirectory/Alignments_ids/SpeciesTreeAlignment.fa -bb 1000 -pre /ds3512/home/panyp/NN1138-2/03.orthofinder_data/02.SoybeanData/OrthoFinder/Results_J an09/WorkingDirectory/Alignments_ids/SpeciesTree Seed: 922007 (Using SPRNG - Scalable Parallel Random Number Generator) Time: Sun Mar 7 09:48:27 2021 Kernel: AVX - 1 threads (16 CPU cores detected)
HINT: Use -nt option to specify number of threads because your CPU has 16 cores! HINT: -nt AUTO will automatically determine the best number of threads to use.
Reading alignment file /ds3512/home/panyp/NN1138-2/03.orthofinder_data/02.SoybeanData/OrthoFinder/Results_Jan09/WorkingDirectory/Alignments_ids/SpeciesTreeAl ignment.fa ... Fasta format detected Alignment most likely contains protein sequences Alignment has 36 sequences with 587974 columns, 323304 distinct patterns 129600 parsimony-informative, 204744 singleton sites, 253630 constant sites Gap/Ambiguity Composition p-value 1 0 30.53% failed 0.00% 2 1 18.80% failed 0.00% 3 2 3.79% passed 99.24% ...... ...... ...... **** TOTAL 7.41% 6 sequences failed composition chi2 test (p-value<5%; df=19) NOTE: minimal branch length is reduced to 0.000000170076 for long alignment
Create initial parsimony tree by phylogenetic likelihood library (PLL)... 240.959 seconds
NOTE: ModelFinder requires 19216 MB RAM!
ModelFinder will test 546 protein models (sample size: 587974) ...
No. Model -LnL df AIC AICc BIC
1 Dayhoff 5415541.378 69 10831220.757 10831220.773 10831999.383
2 Dayhoff+I 5354297.881 70 10708735.762 10708735.779 10709525.673
3 Dayhoff+G4 5328853.866 70 10657847.732 10657847.749 10658637.643
......
......
......
34 mtMAM+R5 5610761.575 77 11221677.150 11221677.170 11222546.051
35 mtMAM+R6 5610637.309 79 11221432.618 11221432.640 11222324.089
ERROR: Numerical underflow (lh-branch). Run again with the safe likelihood kernel via -safe
option
I do some search in this repository and find you don't recommend IQTREE to do the tree inference. Could you tell me which one you recommend ? thank you again.
Hi
It is an issue that occurs unpredictably in IQTREE. In the past I haven't recommended IQTREE because it can be hard to get it to run successfully at the scale required by OrthoFinder, as you have seen. However, now I think I know how these issues can be resolved.
I think I should be able to help you complete your analysis.
- Run iqtree in safe mode on the species tree:
iqtree -s WorkingDirectory/Alignments_ids/SpeciesTreeAlignment.fa -bb 1000 -pre WorkingDirectory/Alignments_ids/SpeciesTree -safe -nt AUTO
- Convert the tree from IDs to species names:
python OrthoFinder/tools/convert_orthofinder_tree_ids.py WorkingDirectory/Alignments_ids/SpeciesTree.treefile WorkingDirectory/SpeciesIDs.txt
-
Reroot the tree manually on the correct outgroup and save it as newick format e.g. to file SpeciesTreeRooted.txt
-
Run OrthoFinder 'from trees' using your SpeciesTreeRooted.txt file (you'll need to provide the path to the file in the command below) :
python orthofinder.py -ft /ds3512/home/panyp/NN1138-2/03.orthofinder_data/02.SoybeanData/OrthoFinder/Results_Jan09 -s SpeciesTreeRooted.txt
Best wishes David
Think you for your help, David. I will follow your suggestion. I'm sure how to reroot the tree manually now, so maybe I will trouble you in the future.
Hi,David. I don't know how to reroot the tree manually on the correct outgroup , I upload all SpeciesTree* file in Results_Jan09/WorkingDirectory/Alignments_ids , could you do me a favor ? Thank you very much. SpeciesTree.zip By the way, my Gene_Trees and Species_Tree is still empty, is it normal ?
Can you send me an email at [email protected] and I will see if I can help.
The directories are empty because OrthoFinder had to terminate because the IQTREE species tree inference failed. I think I have a solution that will allow a complete set of results files to be generated. If you are able to run it successfully then I will use that information to post the solution here for other users.
Best wishes David
Thank you , David. I have sent you an email , please check if I miss something. By the way, there are four files contain species tree, SpeciesTree.contree, SpeciesTree.iqtree, SpeciesTree.treefile and SpeciesTree_accessions.treefile, Which one should I use to generate the root tree ?
Dear @davidemms,
I have done as you suggested and got this error:
ERROR: 'e_sativa' is missing from species tree ERROR: 'g_gynandra' is missing from species tree ERROR: Additional species ''b_tournefortii.fasta'' in species tree ERROR: Additional species ''b_repanda.fasta'' in species tree
I had to erase the ' symbols and the .fasta suffixes from the RootedTree.txt
Cheers,
Hi,David. when i run the command
orthofinder -S diamond -M msa -T raxml -ft /work/user/....../OrthoFinder/Results_Mar03_2/ -s /work/user/....../OrthoFinder/Results_Mar03_2/Species_Tree/SpeciesTree_rooted2.txt
the error was
Test can run "raxml" - failed Warning, you specified a working directory via "-w" Keep in mind that RAxML only accepts absolute path names, not relative ones! RAxML can't, parse the alignment file as phylip file it will now try to parse it as FASTA file RAxML output files with the run ID
already exist in directory /tmp/ ...... exiting ERROR: Cannot run user-configured tree method 'raxml' Please check program is installed and that it is correctly configured in the orthofinder/config.json file
does model of "-ft“ lost the $PATH thus cause error?