Optimization for big sequences
Hello,
I have a file with 73 sequences that I would like to blastx against NR (~350 GB). But 10 of those sequences are big, the biggest is > 27 Mpb and the smallest of those 10 sequences is > 16 Mpb. The rest of the sequences is < 500 000 pb.
I am on a cluster, I put 150 cpu and 760 G of RAM. With the following options : -k 5 -b 6 -c 1 -f 6 -e 1e-25 --sensitive. But even with those options, the whole blastx would take ~31 days of calculs.
Is there another options I can use to speed up the calcul ? I can go higher on CPU and RAM if needed.
Thanks
Sequences this long will only work efficiently in frameshift alignment mode, so you should set -F 15 --range-culling. Also there's a known performance issue with this in the current version, so please run v2.0.15 until this is fixed.
The other option would be to extract ORFs and run blastp.
Thank you for the fast answer. I will test those options.