diamond icon indicating copy to clipboard operation
diamond copied to clipboard

When running diamond blastp in multiprocessing mode, some processes hang or segfault non deterministically.

Open beazerj opened this issue 1 year ago • 2 comments

I'm running diamond blastp in multiprocessing mode on multiple machines (gcloud c2-cpu-standard-60 machine, 60cpus, 260GB memory). Here is the specific command for the blastp search:

diamond blastp -q seqs.faa -d seqs -o out -f 6 qseqid sseqid corrected_bitscore --approx-id 50 --query-cover 90 -k1000 -c1 --more-sensitive -b6 --multiprocessing --tmpdir tmp --parallel_tmp --log.

During the run, I'm observing that some of the processes will either hang or segfault. After recovering with the --mp-recover option and restarting the alignment process some of these processes will complete (some may still fail). The hang or segfault typically occurs at the "Computing Alignments..." step. Peak RSS is 115GB.

I've run this command on anywhere from 8 to 72 nodes and using multiple levels of sensitivity. It doesn't seem dependent on the number of nodes and i've seen it at every sensitivity level i've tried: fast, default and more-sensitive. I've tried both v2.1.8 and v2.1.9 releases of diamond

May be related to #732 and #747. The issue poster in #732 mentioned that their issue is resolved by downgrading to v2.0.15. If i were to make this downgrade? Would this make a meaningful difference to the quality / speed of the alignment?

Could be some merit to the idea that this issue occurs when trying to align a small number of sequences. Running the diamond depeclust workflow with the same steps (fast, default, more-sensitive) but on a single machine with greater memory (900GB) such that there are only 4 blocks instead of 12, i don't see the segfault issue except this takes many many days to complete.

beazerj avatar Mar 29 '24 18:03 beazerj

I'm having a similar issue. I have over 10000 analyses, so I use Python's for loop to blastp individually. diamond blastp --more-sensitive -p 40 -q {input_file} -d {dmnd} --evalue 1e-5 -f 6 --out {result} --query-cover cover --subject-cover cover -k 0 --id 40 However, for some reason, the diamond quest stops on a quest and there aren't many sequences within that quest. This error seems to be memory-related, as it only happens when my server runs other tasks (not diamond ones). But in reality, the server has plenty of memory and CPU left over. My diamond version is v2.1.8.162.

fengqingling avatar Apr 22 '24 01:04 fengqingling

I will try to reproduce the problem. Unfortunately it's not easy to track down this sort of problem that only occurs randomly.

bbuchfink avatar Oct 21 '24 18:10 bbuchfink

The latest release should fix hanging or crashes in the computing alignments stage. Please reopen in case the issue persists.

bbuchfink avatar Jan 25 '25 10:01 bbuchfink