diamond
diamond copied to clipboard
About the running times
Hello
I have a large protein sequence file as below, sum_len 10,885,629,915 bp.
> file format type num_seqs sum_len min_len avg_len max_len
> non_redundancy_protein.fasta FASTA Protein 56,324,313 10,885,629,915 34 193.3 14,951
I use diamond to blastp with NCBI NR database as below:
nohup diamond blastp -d nr_20230728.dmnd -q ../07rm_redundancy/07partial_cdhit2/non_redundancy_protein.fasta --outfmt 6 --max-target-seqs 5 -e 1e-10 --query-cover 80 --id 50 --threads 140 -c 1 -b 16 -o diamond_annotation_nr.tsv > diamond_log.txt 2>&1 &
It seems diamond need too mang time to finish it, I'd like to know How mang query block will this command run?
I would appreciate your help with this question.
nohup: ignoring input
diamond v2.1.8.162 (C) Max Planck Society for the Advancement of Science, Benjamin Buchfink, University of Tuebingen
Documentation, support and updates available at http://www.diamondsearch.org
Please cite: http://dx.doi.org/10.1038/s41592-021-01101-x Nature Methods (2021)
#CPU threads: 140
Scoring parameters: (Matrix=BLOSUM62 Lambda=0.267 K=0.041 Penalties=11/1)
Temporary directory:
#Target sequences to report alignments for: 5
Opening the database... [0.074s]
Database: /home/adm/database/NCBI/NCBI_NR/nr_20230728.dmnd (type: Diamond database, sequences: 595907626, letters: 234169316349)
Block size = 16000000000
Opening the input file... [0.034s]
Opening the output file... [0s]
Loading query sequences... [56.861s]
Masking queries... [10.58s]
Algorithm: Double-indexed
Building query histograms... [7.472s]
Seeking in database... [0s]
Loading reference sequences... [30.694s]
Masking reference... [17.357s]
Initializing dictionary... [0.075s]
Initializing temporary storage... [0s]
Building reference histograms... [10.244s]
Allocating buffers... [0.001s]
Processing query block 1, reference block 1/15, shape 1/2.
Building reference seed array... [6.012s]
Building query seed array... [5.681s]
Computing hash join... [20.388s]
Masking low complexity seeds... [3.321s]
Searching alignments... [1395.4s]
Deallocating memory... [0s]
Processing query block 1, reference block 1/15, shape 2/2.
Building reference seed array... [4.54s]
Building query seed array... [6.068s]
Computing hash join... [31.181s]
Masking low complexity seeds... [2.292s]
Searching alignments... [1199.63s]
Deallocating memory... [0s]
Deallocating buffers... [9.142s]
Clearing query masking... [3.581s]
Opening temporary output file... [0s]
Computing alignments... Loading trace points... [353.293s]
Sorting trace points... [98.201s]
Computing alignments... [1444.16s]
Deallocating buffers... [20.527s]
Loading trace points... [0.014s]
Sorting trace points... [108.536s]
Computing alignments... [1457.22s]
Deallocating buffers... [31.078s]
Loading trace points... [0.036s]
Sorting trace points... [83.7s]
Computing alignments... [1138.63s]
Deallocating buffers... [11.271s]
Loading trace points... [0.036s]
Sorting trace points... [101.461s]
Computing alignments... [1432.87s]
Deallocating buffers... [21.436s]
Loading trace points... [0.047s]
Sorting trace points... [103.237s]
Computing alignments... [1348.66s]
Deallocating buffers... [21.272s]
Loading trace points... [0.007s]
Sorting trace points... [127.472s]
Computing alignments... [1707.93s]
Deallocating buffers... [24.559s]
Loading trace points... [0.034s]
Sorting trace points... [117.072s]
Computing alignments... [1555.41s]
Deallocating buffers... [19.418s]
Loading trace points... [0.049s]
Sorting trace points... [122.935s]
Computing alignments... [1554.81s]
Deallocating buffers... [23.619s]
Loading trace points... [0.023s]
Sorting trace points... [109.928s]
Computing alignments... [1468.24s]
Deallocating buffers... [19.654s]
Loading trace points... [0.032s]
Sorting trace points... [106.685s]
Computing alignments... [1403.99s]
Deallocating buffers... [22.997s]
Loading trace points... [0.049s]
Sorting trace points... [105.344s]
Computing alignments... [1378.31s]
Deallocating buffers... [17.975s]
Loading trace points... [0.041s]
Sorting trace points... [99.973s]
Computing alignments... [1339.73s]
Deallocating buffers... [15.575s]
Loading trace points... [0.006s]
Sorting trace points... [110.233s]
Computing alignments... [1421.66s]
Deallocating buffers... [25.309s]
Loading trace points... [0.03s]
Sorting trace points... [99.521s]
Computing alignments... [1433.63s]
Deallocating buffers... [17.191s]
Loading trace points... [0.01s]
Sorting trace points... [87.972s]
Computing alignments... [1277.04s]
Deallocating buffers... [12.884s]
Loading trace points... [0s]
Sorting trace points... [120.664s]
Computing alignments... [1293.89s]
Deallocating buffers... [22.838s]
Loading trace points... [0s]
[25040.5s]
Deallocating reference... [0.069s]
Loading reference sequences... [33.603s]
Masking reference... [16.284s]
Initializing dictionary... [0.077s]
Initializing temporary storage... [0.01s]
Building reference histograms... [10.543s]
Allocating buffers... [0.001s]
Processing query block 1, reference block 2/15, shape 1/2.
Building reference seed array... [6.667s]
Building query seed array... [6.038s]
Computing hash join... [22.306s]
Masking low complexity seeds... [2.497s]
Searching alignments... [1417.19s]
Deallocating memory... [0s]
Processing query block 1, reference block 2/15, shape 2/2.
Building reference seed array... [4.602s]
Building query seed array... [3.34s]
Computing hash join... [66.474s]
Masking low complexity seeds... [2.595s]
Searching alignments... [1208.88s]
Deallocating memory... [0s]
Deallocating buffers... [1.733s]
Clearing query masking... [3.206s]
Opening temporary output file... [0s]
Computing alignments... Loading trace points... [360.445s]
Sorting trace points... [122.79s]
Computing alignments... [1581.5s]
Deallocating buffers... [22.937s]
Loading trace points... [0.038s]
Sorting trace points... [131.016s]
Computing alignments... [1712.84s]
Your block size is 16 GB and the query file is ~11 GB, so it will be one query block.
Thanks !