No queries aligned when running diamond in distributed mode
I'm trying to do an all-against-all blastp search of several proteomes. When running this with diamond on a single node, there are many hits as would be expected. However, when I run diamond in distributed mode across multiple nodes, the log reports that zero queries are aligned and the results file is empty.
I'm not sure if this is a bug or I am doing something incorrectly, so I would love any feedback.
Below are the steps I'm running
- First, I run the mp-init step on the login node:
export PATH=/gpfs/software/diamond/gcc12/2.1.4/bin:$PATH
PREFIX=gcc_fungi_12node_test
DIAMOND_TEMP=`pwd`/diamond_temp_${PREFIX}
TEMP=/tmp
QUERY=fungi_combined.protein.faa
DB=refseq_fungi.dmnd
diamond blastp --query ${QUERY} --db ${DB} --multiprocessing --mp-init --tmpdir ${TEMP} --parallel-tmpdir ${DIAMOND_TEMP}
- Next I submit a batch job to the scheduler (from the same working directory as step 1):
#!/usr/bin/env bash
#SBATCH --job-name=diamond_gcc
#SBATCH --output=diamond_gcc_refseq_fungi_12node.log
#SBATCH -N 12
#SBATCH --time=08:00:00
#SBATCH --ntasks-per-node=1
#SBATCH -p medium-24core
module load gcc/12.1.0
export PATH=/gpfs/software/diamond/gcc12/2.1.4/bin:$PATH
OUTPUT=diamond_gcc_fungi_${SLURM_NNODES}_nodes
PREFIX=gcc_fungi_12node_test
DIAMOND_TEMP=`pwd`/diamond_temp_${PREFIX}
TEMP=/tmp
QUERY=fungi_combined.protein.faa
DB=refseq_fungi.dmnd
# run the search step
srun diamond blastp --db ${DB} --query ${QUERY} -o ${OUTPUT} --multiprocessing --tmpdir ${TEMP} --parallel-tmpdir ${DIAMOND_TEMP}
Note that this is Diamond v2.1.4 compiled with the GCC 12.1.0 compiler. I've tried doing this with various numbers of nodes, and placing the TEMP directory both within and outside of the parallel file system, and even compiled it on multiple different clusters with different compilers, but I still consistently get zero queries aligned from the all-by-all blastp when running in distributed mode (but not when running on a single node).
I've also attached my log file.
Do you see anything that I'm doing wrong or otherwise have any advice for getting diamond to work in distributed mode?
Thanks! Dave
I can't reproduce the problem using v2.1.4. Could you run this again using the --log option and show me the output?
Thanks for looking into this. I've rerun with --log and am attaching the output log file.
diamond.log
The only thing I noticed about these logs is that your calls with --mp-init don't have a block size parameter while the others do, that could be a problem.