CONCOCT - OpenBLAS Warning
Description of the bug
CONCOCT exceeding 24 runtime. Only action for 20 hours in .command.log and .command.err files these are being updated with the following repeated error: p and running. Check /mnt/scratch/c1711572/mag_nf/work/df/9ed083848ec2dbe65e17338428a179/MEGAHIT-CONCOCT-group-4_log.txt for progress /usr/local/lib/python3.11/site-packages/sklearn/utils/validation.py:1858: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['int', 'str']. An error will be raised in 1.2. warnings.warn( /usr/local/lib/python3.11/site-packages/sklearn/utils/validation.py:1858: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['int', 'str']. An error will be raised in 1.2. warnings.warn( Setting 24 OMP threads Generate input data OpenBLAS Warning : Detect OpenMP Loop and this application may hang. Please rebuild the library with USE_OPENMP=1 option. OpenBLAS Warning : Detect OpenMP Loop and this application may hang. Please rebuild the library with USE_OPENMP=1 option. ...... the OpenBLAS warning then repeats x25,200
Command used and terminal output
Script used:
!/bin/bash
#SBATCH --partition=jumbo # the requested queue
#SBATCH --nodes=1 # number of nodes to use
#SBATCH --tasks-per-node=1
#SBATCH --cpus-per-task=16
#SBATCH --mem=6GB # in megabytes, unless unit explicitly stated
#SBATCH --error=%J.err # redirect stderr to this file
#SBATCH --output=%J.out # redirect stdout to this file
##SBATCH [email protected] # email address used for event notification
##SBATCH --mail-type=all
echo "Some Usable Environment Variables:"
echo "================================="
echo "hostname=$(hostname)"
echo \$SLURM_JOB_ID=${SLURM_JOB_ID}
cat $0
module purge
module load nextflow/23.04.1
module load singularity/3.8.7
export NXF_OPTS="-Xms500M -Xmx4G"
workdir="/mnt/scratch/$USER/mag_nf"
reportdir="wtw_Hirwaun_reports"
outputdir="wtw_Hirwaun_mag_output"
mkdir $reportdir
nextflow run mag_2_5_1/ \
-c cardiff_profile_epyc_slurm_091223 \
-with-report "${reportdir}/${SLURM_JOB_ID}_report.html" \
-with-dag "${reportdir}/${SLURM_JOB_ID}_flowchart.png" \
-with-trace "${reportdir}/${SLURM_JOB_ID}_tracereport.txt" \
-with-timeline "${reportdir}/${SLURM_JOB_ID}_timeline.html" \
--gtdb_db '/mnt/scratch/nodelete/nextflow/mag/2.3.0/gtdbtk/gtdbtk_r202_data.tar.gz' \
--cat_db '/mnt/scratch/nodelete/nextflow/mag/2.3.0/cat_prepare/CAT_prepare_20210107.tar.gz' \
--checkm_db '/mnt/scratch/nodelete/nextflow/mag/2.3.0/checkm/checkm_data_2015_01_16.tar.gz' \
--busco_db '/mnt/scratch/nodelete/nextflow/mag/2.3.0/busco' \
--outdir ${outputdir} \
--input ${workdir}/Hirwaun_mag.csv \
--skip_spades \
--coassemble_group \
--binning_map_mode all \
-resume
Relevant files
System information
Nextflow: 23.04.1 Hardware Slurm HPC Container: Singularity OS: linux nf-core/mag 2.5.1
Hi @Peter-Kille thanks for the report.
Unfortunately we've been very aware of the very slow CONCOCT running time (and one of the authors - @alneberg has acknowledged this, with a few suggestions but I can't find them ATM).
I've personally not seen that particular warning before in other reports however. Generally this would imply there is something funky with the biocontainer.
I'm still on parental leave until January so I can't investigate further updating the container (if that is the source of the issue).
However the general advice we've given to others are:
- Increase the number of CPUs to the concoct process
- Increase the wall time of both concoct (and presumably in your case, the main nextflow job) and be patient
- in previous cases the tool has been running, just extremely slow. I don't know if that applies here
- Skip CONCOCT and rely on maxbin/metabat
Finally, @alexhbnr actually had found general problems with OpenBLAS on our (old, SGE) cluster... I don't think this is the same problem as you but you could still try
- set the number of OpenBLAS threads to 1 using an environment variable. I'll update this comment when I find the config example (I'm currently on my phone)
Edit: the relevant settings - https://github.com/nf-core/configs/blob/master/conf%2Fpipeline%2Fmag%2Feva.config#L7-L10
This is a frequently reported issue for CONCOCT actually. Please forgive my ignorance but I don't exactly recall the cause of it. I believe it has to do with how the openblas is compiled inside the concoct conda package. If you're really keen on using CONCOCT, you would have to try to create a container that does not have this issue. I believe the issue is easy enough to trigger for any small test run.
Dear Both - thank you so much for your time to respond. I will probably skip the concoct step for now as suggested as the current data is rather large and test with smaller data set and report back.
I have been using core-nf/mag pipeline previous without the concoct step and it has worked really well - thanks so much for all your efforts in developing the pipeline they are very much appreciated :)
Should b efixed here!
https://github.com/nf-core/mag/pull/631