OrthoFinder
OrthoFinder copied to clipboard
Diamond stuck on "Running diamond all-versus-all"
Hi OrthoFinder team,
I am running OrthoFinder with 5 files, ranging from ~600 kb to ~50 Mb in size, in amino acid format. I'm running the program with this code: orthofinder.py -f amino_folder -t 120 -a 64. I'm using a high-performance Linux computer and have OrthoFinder 2.5.5 and the most recent version of Diamond installed. The process begins using multiple CPUs, then drops to a singular CPU and runs for over 16 hours without progress. How can I fix this issue so the code runs more quickly?
Thanks so much, I look forward to hearing from you!
Best, Zoe
Hi Zoe
Can you count how many genes are in each file, grep -c ">" * and post back here? Is it genes from one species per file? I would recommend using a representative longest sequence per gene rather than all transcript variants to improve runtime.
Another suggestion, with 5 files you will be parallelising over only 25 diamond runs whereas it looks like you have 120 cores available. You can edit the OrthoFinder config.json file to get each diamond process to use more than one thread in parallel too. I think probably with '-t 5', but you may need to check the correct option.
All the best David