diamond icon indicating copy to clipboard operation
diamond copied to clipboard

No queries aligned when running diamond in distributed mode

Open davidecarlson opened this issue 2 years ago • 3 comments

I'm trying to do an all-against-all blastp search of several proteomes. When running this with diamond on a single node, there are many hits as would be expected. However, when I run diamond in distributed mode across multiple nodes, the log reports that zero queries are aligned and the results file is empty.

I'm not sure if this is a bug or I am doing something incorrectly, so I would love any feedback.

Below are the steps I'm running

  1. First, I run the mp-init step on the login node:
export PATH=/gpfs/software/diamond/gcc12/2.1.4/bin:$PATH

PREFIX=gcc_fungi_12node_test
DIAMOND_TEMP=`pwd`/diamond_temp_${PREFIX}
TEMP=/tmp

QUERY=fungi_combined.protein.faa
DB=refseq_fungi.dmnd

diamond blastp --query ${QUERY} --db ${DB} --multiprocessing --mp-init --tmpdir ${TEMP} --parallel-tmpdir ${DIAMOND_TEMP}
  1. Next I submit a batch job to the scheduler (from the same working directory as step 1):
#!/usr/bin/env bash

#SBATCH --job-name=diamond_gcc
#SBATCH --output=diamond_gcc_refseq_fungi_12node.log
#SBATCH -N 12
#SBATCH --time=08:00:00
#SBATCH --ntasks-per-node=1
#SBATCH -p medium-24core

module load gcc/12.1.0

export PATH=/gpfs/software/diamond/gcc12/2.1.4/bin:$PATH

OUTPUT=diamond_gcc_fungi_${SLURM_NNODES}_nodes

PREFIX=gcc_fungi_12node_test
DIAMOND_TEMP=`pwd`/diamond_temp_${PREFIX}
TEMP=/tmp


QUERY=fungi_combined.protein.faa
DB=refseq_fungi.dmnd


# run the search step

srun diamond blastp --db ${DB} --query ${QUERY} -o ${OUTPUT} --multiprocessing --tmpdir ${TEMP} --parallel-tmpdir ${DIAMOND_TEMP}

Note that this is Diamond v2.1.4 compiled with the GCC 12.1.0 compiler. I've tried doing this with various numbers of nodes, and placing the TEMP directory both within and outside of the parallel file system, and even compiled it on multiple different clusters with different compilers, but I still consistently get zero queries aligned from the all-by-all blastp when running in distributed mode (but not when running on a single node).

I've also attached my log file.

Do you see anything that I'm doing wrong or otherwise have any advice for getting diamond to work in distributed mode?

Thanks! Dave

diamond_gcc_refseq_fungi_12node.log

davidecarlson avatar Mar 07 '23 14:03 davidecarlson

I can't reproduce the problem using v2.1.4. Could you run this again using the --log option and show me the output?

bbuchfink avatar Mar 08 '23 15:03 bbuchfink

Thanks for looking into this. I've rerun with --log and am attaching the output log file. diamond.log

davidecarlson avatar Mar 08 '23 20:03 davidecarlson

The only thing I noticed about these logs is that your calls with --mp-init don't have a block size parameter while the others do, that could be a problem.

bbuchfink avatar Mar 17 '23 10:03 bbuchfink