diamond icon indicating copy to clipboard operation
diamond copied to clipboard

Fails when masking queries

Open nadegeguiglielmoni opened this issue 5 years ago • 3 comments

Hello,

I have been trying to run diamond but without any success. I tried running it on a cluster with 64 GB RAM, after installing diamond from the source, and that is what I get:

diamond v2.0.4.142 (C) Max Planck Society for the Advancement of Science
Documentation, support and updates available at http://www.diamondsearch.org

#CPU threads: 32
Scoring parameters: (Matrix=BLOSUM62 Lambda=0.267 K=0.041 Penalties=11/1)
Temporary directory: 
Opening the database...  [0.179s]
#Target sequences to report alignments for: 25
Reference = ../../../Tools/blobtools/uniprot/reference_proteomes.dmnd
Sequences = 52962370
Letters = 19499493459
Block size = 2000000000
Loading taxonomy mapping...  [1.025s]
Opening the input file...  [0.125s]
Opening the output file...  [0s]
Loading query sequences...  [11.851s]
Masking queries... terminate called recursively
terminate called after throwing an instance of 'std::bad_alloc'
/var/slurmd-cm2_tiny/job123415/slurm_script: line 14: 24198 Abandon                 (core dumped) ../../../Tools/diamond-2.0.4/bin/diamond blastx --db ../../../Tools/blobtools/uniprot/reference_proteomes.dmnd -q assembly.fasta -f 6 qseqid staxids bitscore --threads 32 -o diamond.out

nadegeguiglielmoni avatar Nov 04 '20 09:11 nadegeguiglielmoni

Please try using a smaller block size (like -b0.4) and also --log. How long are your query sequences?

bbuchfink avatar Nov 04 '20 20:11 bbuchfink

The largest one is 105 Mb. If the sequences length is a problem, I can filter out the largest ones, then the remaining ones should be up to 5 Mb.

nadegeguiglielmoni avatar Nov 04 '20 20:11 nadegeguiglielmoni

It could be due to the length. You can also try turning off the masking using --masking 0.

bbuchfink avatar Nov 04 '20 20:11 bbuchfink