diamond Advice for diamond distributed-memory implementation on HPC

Hi, I'm using my university's HPC cluster with access to 100s of nodes and I am hoping to get some advice for selecting the number of nodes, and -c and -b parameters to optimize diamond's performance on my data. I've run a few tests but there are quite a few combinations of variables to consider. Any suggestions based on your knowledge of diamond's performance would be much appreciated.

I'm using diamond blastx with 2 query sequences, each ~3.2 GB gzipped. My reference db is ~0.8 GB. I have about 60 of these sets to analyze. The nodes on the cluster each have 32 GB RAM and 16 processors. I'm implementing the distributed process with openmpi and using the diamond flags:

--sensitive
-f 6 qseqid sseqid evalue bitscore
--quiet
-k 1
-e 0.00001

Thanks, Grace

May 11 '21 14:05 gcagle1

You can try using a lower -c, like -c1 -b2 or -c1 -b1.5 if the first one fails. Alternatively, you can try a bigger block size, like -b4 -c4. I'm not sure which would be faster, you'd have to try that.

May 13 '21 13:05 bbuchfink

Thanks. Should we set the threads differently for parallel? The performance I'm getting seems strange. Testing -c1 -b2 on 1 query file (~60 million 150-bp reads) on 4 and 10 nodes (128 and 320 GB of RAM) both jobs used it all plus 100s GB of virtual memory, had high load, and low CPU%. Does that seem unusual? There were no errors from diamond and both jobs were terminated for high load. I let diamond auto-select threads and it picks 16 and I'm kind of wondering if that's the problem...

My cluster uses Torque/PBS job scheduling and I'm using mpich/3.2/INTEL-18.0.0 (not for any reason besides it's available). I have Diamond 2.0.9 installed from bioconda. This is the mpi call in my PBS script, where NPROCS is the number of processors I requested (64 and 160 in these cases) and diamond-run.sh is a script containing the diamond command:

mpirun -np $NPROCS -machinefile $PBS_NODEFILE ./diamond-run.sh

May 20 '21 21:05 gcagle1

That seems very unusual, with these settings diamond shouldn't use more than 40 GB of memory, and run at maximum CPU% most of the time. Either you data behaves very unusually, or something is going wrong with these MPI calls. Could you try to run a single one of these jobs without the MPI wrapper, and also use the --log option for diamond and send me the output?

May 21 '21 08:05 bbuchfink

The log is attached. Seems like it worked normally without the MPI wrapper and multiprocessing flags. CPU and load were perfect.

resources_used.vmem=37029428kb resources_used.walltime=01:52:25 resources_used.mem=35217360kb

diamond.log

May 21 '21 16:05 gcagle1

Maybe the MPI is causing the problem then, like spawning multiple instances of diamond on one host? Not sure since I'm not very knowledgeable about MPI.

May 25 '21 08:05 bbuchfink