Diamond Clustering - Failed to allocate sufficient memory. Please refer to the manual for instructions on memory usage.
Hi, I'm trying to run diamond clustering on a file with 9 million sequences and got this error.
I have 128 GB of RAM and 1.8T for disk storage. The database.dmnd database I used is 1.2 GB.
The command I used:
diamond cluster -d database.dmnd -o output.tsv --approx-id 80 --tmpdir /data -M 64G
My log file contains:
diamond v2.1.10.164 (C) Max Planck Society for the Advancement of Science, Benjamin Buchfink, University of Tuebingen
Documentation, support and updates available at http://www.diamondsearch.org
Please cite: http://dx.doi.org/10.1038/s41592-021-01101-x Nature Methods (2021)
#CPU threads: 32
Scoring parameters: (Matrix=BLOSUM62 Lambda=0.267 K=0.041 Penalties=11/1)
Opening the input file... [0.995s]
Input database: database.dmnd (8585030 sequences, 993623733 letters)
Temporary directory: /data
#Target sequences to report alignments for: unlimited
Database: database.dmnd (type: Diamond database, sequences: 8585030, letters: 993623733)
Block size = 12800000000
Opening the input file... [0s]
Opening the output file... [0s]
Seeking in database... [0s]
Loading query sequences... [0.638s]
Length sorting queries... [0.601s]
Algorithm: Double-indexed
Building query histograms... [3.384s]
Seeking in database... [0s]
Seeking in database... [0.004s]
Initializing temporary storage... [0.003s]
Building reference histograms... [1.399s]
Allocating buffers... [0s]
Processing query block 1, reference block 1/1, shape 1/1.
Building reference seed array... [1.48s]
Building query seed array... [1.426s]
Computing hash join... [0.503s]
Masking low complexity seeds... [0.04s]
Building kmer ranking... [0.008s]
Searching alignments... [0.952s]
Deallocating memory... [0s]
Deallocating buffers... [0.004s]
Clearing query masking... [0.069s]
Computing alignments... [35.532s]
Deallocating reference... [0s]
Loading reference sequences... [0s]
Deallocating buffers... [0s]
Deallocating queries... [0.003s]
Closing the output file... [0s]
Closing the database... [0s]
Cleaning up... [0s]
Total time = 46.054s
Reported 52104517 pairwise alignments, 52104517 HSPs.
3412693 queries aligned.
Finished search. #Edges: 97532173
Allocating buffers... [0s]
Loading edges... [0.391s]
Sorting edges... [0.441s]
Computing edge counts... [0.105s]
Computing vertex cover... [1.334s]
Computing reassignment... [0.104s]
Clustering round 1 complete. #Input sequences: 8585030 #Clusters: 1010961 #Letters: 115151078 Time: 48s
Temporary directory: /data
#Target sequences to report alignments for: unlimited
Database: database.dmnd (type: Diamond database, sequences: 8585030, letters: 993623733)
Block size = 3200000000
Opening the input file... [0s]
Opening the output file... [0s]
Seeking in database... [0s]
Loading query sequences... [0.268s]
Algorithm: Double-indexed
Building query histograms... [0.287s]
Seeking in database... [0s]
Initializing temporary storage... [0s]
Building reference histograms... [0.036s]
Allocating buffers... [0s]
Processing query block 1, reference block 1/1, shape 1/1.
Building reference seed array... [0.073s]
Building query seed array... [0.074s]
Computing hash join... [0.266s]
Masking low complexity seeds... [0.07s]
Searching alignments... [1729.5s]
Deallocating memory... [0s]
Deallocating buffers... [0.002s]
Clearing query masking... [0.014s]
Computing alignments... Failed to allocate sufficient memory. Please refer to the manual for instructions on memory usage.
Could anyone help me out to resolve this issue?
Thanks in advance!
There are some issues causing increased memory use that will be fixed in the next release. For now one thing you could try is using --bin 256 (or possibly higher).
Another option would be --cluster-steps faster_lin fast_lin, that should be sufficient for 80% id cutoff.
Please try again with the latest release, memory use has been reduced.