diamond icon indicating copy to clipboard operation
diamond copied to clipboard

Time complexity

Open ucabuk opened this issue 2 years ago • 2 comments

Hi,

I am using diamond to get the taxonomic information of the protein. Here is my code that I use in the cluster:

diamond blastp -d nr --evalue 0.00001 --top 15 -b8 -c 1 -q /tmp/${ID}.fasta -o /tmp/${ID}.outfmt102 --sensitive -t /tmp/${TMP}_${ID}_blast -f 102 --threads 16

I used a total of 350101 proteins in that case.

Here is the header of log of the sample.

Opening the database... [19.032s] Database: nr (type: Diamond database, sequences: 520277157, letters: 204016416296) Block size = 8000000000 Loading taxonomy nodes... [0.006s] Opening the input file... [0.013s] Opening the output file... [0s] Loading query sequences... [0.237s] Masking queries... [3.094s] Algorithm: Double-indexed Building query histograms... [9.969s] Allocating buffers... [0s] Loading reference sequences... [118.803s] Masking reference... [354.834s] Initializing dictionary... [0.007s] Initializing temporary storage... [0s] Building reference histograms... [1206s] Allocating buffers... [0s] Processing query block 1, reference block 1/26, shape 1/16. Building reference seed array... [111.35s] Building query seed array... [0.923s] Computing hash join... [81.05s] Masking low complexity seeds... [2.24s] Searching alignments... [634.385s] Processing query block 1, reference block 1/26, shape 2/16. Building reference seed array... [104.499s] Building query seed array... [0.857s] Computing hash join... [79.265s] Masking low complexity seeds... [2.252s]

So, If I correctly calculated the estimated time for this sample, It will take 83~ hours. Is not it too much, is it? Maybe I missed something in the code. I also used SSD on the node.

Thanks. Ugur

ucabuk avatar Feb 14 '23 16:02 ucabuk

This does not seem out of the ordinary. The nr is a big database, in sensitive mode with only 16 threads it may take this long.

bbuchfink avatar Feb 17 '23 14:02 bbuchfink

I see. Okay I will try to increase the threads up to more than 64 and see what happens.

Thank you !

ucabuk avatar Feb 17 '23 16:02 ucabuk