mag icon indicating copy to clipboard operation
mag copied to clipboard

CONCOCT - OpenBLAS Warning

Open Peter-Kille opened this issue 2 years ago • 3 comments

Description of the bug

CONCOCT exceeding 24 runtime. Only action for 20 hours in .command.log and .command.err files these are being updated with the following repeated error: p and running. Check /mnt/scratch/c1711572/mag_nf/work/df/9ed083848ec2dbe65e17338428a179/MEGAHIT-CONCOCT-group-4_log.txt for progress /usr/local/lib/python3.11/site-packages/sklearn/utils/validation.py:1858: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['int', 'str']. An error will be raised in 1.2. warnings.warn( /usr/local/lib/python3.11/site-packages/sklearn/utils/validation.py:1858: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['int', 'str']. An error will be raised in 1.2. warnings.warn( Setting 24 OMP threads Generate input data OpenBLAS Warning : Detect OpenMP Loop and this application may hang. Please rebuild the library with USE_OPENMP=1 option. OpenBLAS Warning : Detect OpenMP Loop and this application may hang. Please rebuild the library with USE_OPENMP=1 option. ...... the OpenBLAS warning then repeats x25,200

Command used and terminal output

Script used:

!/bin/bash
#SBATCH --partition=jumbo      # the requested queue
#SBATCH --nodes=1              # number of nodes to use
#SBATCH --tasks-per-node=1
#SBATCH --cpus-per-task=16
#SBATCH --mem=6GB             # in megabytes, unless unit explicitly stated
#SBATCH --error=%J.err         # redirect stderr to this file
#SBATCH --output=%J.out        # redirect stdout to this file
##SBATCH [email protected]  # email address used for event notification
##SBATCH --mail-type=all 

  
echo "Some Usable Environment Variables:"
echo "================================="
echo "hostname=$(hostname)"
echo \$SLURM_JOB_ID=${SLURM_JOB_ID}


cat $0

module purge

module load nextflow/23.04.1
module load singularity/3.8.7

export NXF_OPTS="-Xms500M -Xmx4G"

workdir="/mnt/scratch/$USER/mag_nf"
reportdir="wtw_Hirwaun_reports"
outputdir="wtw_Hirwaun_mag_output"

mkdir $reportdir

nextflow run mag_2_5_1/ \
         -c cardiff_profile_epyc_slurm_091223 \
         -with-report "${reportdir}/${SLURM_JOB_ID}_report.html" \
         -with-dag "${reportdir}/${SLURM_JOB_ID}_flowchart.png" \
         -with-trace "${reportdir}/${SLURM_JOB_ID}_tracereport.txt" \
         -with-timeline "${reportdir}/${SLURM_JOB_ID}_timeline.html" \
         --gtdb_db '/mnt/scratch/nodelete/nextflow/mag/2.3.0/gtdbtk/gtdbtk_r202_data.tar.gz' \
         --cat_db '/mnt/scratch/nodelete/nextflow/mag/2.3.0/cat_prepare/CAT_prepare_20210107.tar.gz' \
         --checkm_db '/mnt/scratch/nodelete/nextflow/mag/2.3.0/checkm/checkm_data_2015_01_16.tar.gz' \
         --busco_db '/mnt/scratch/nodelete/nextflow/mag/2.3.0/busco' \
         --outdir ${outputdir} \
         --input ${workdir}/Hirwaun_mag.csv \
         --skip_spades \
         --coassemble_group \
         --binning_map_mode all \
         -resume

Relevant files

config_nextflow-log.zip

System information

Nextflow: 23.04.1 Hardware Slurm HPC Container: Singularity OS: linux nf-core/mag 2.5.1

Peter-Kille avatar Dec 13 '23 08:12 Peter-Kille

Hi @Peter-Kille thanks for the report.

Unfortunately we've been very aware of the very slow CONCOCT running time (and one of the authors - @alneberg has acknowledged this, with a few suggestions but I can't find them ATM).

I've personally not seen that particular warning before in other reports however. Generally this would imply there is something funky with the biocontainer.

I'm still on parental leave until January so I can't investigate further updating the container (if that is the source of the issue).

However the general advice we've given to others are:

  1. Increase the number of CPUs to the concoct process
  2. Increase the wall time of both concoct (and presumably in your case, the main nextflow job) and be patient
    • in previous cases the tool has been running, just extremely slow. I don't know if that applies here
  3. Skip CONCOCT and rely on maxbin/metabat

Finally, @alexhbnr actually had found general problems with OpenBLAS on our (old, SGE) cluster... I don't think this is the same problem as you but you could still try

  1. set the number of OpenBLAS threads to 1 using an environment variable. I'll update this comment when I find the config example (I'm currently on my phone)

Edit: the relevant settings - https://github.com/nf-core/configs/blob/master/conf%2Fpipeline%2Fmag%2Feva.config#L7-L10

jfy133 avatar Dec 13 '23 09:12 jfy133

This is a frequently reported issue for CONCOCT actually. Please forgive my ignorance but I don't exactly recall the cause of it. I believe it has to do with how the openblas is compiled inside the concoct conda package. If you're really keen on using CONCOCT, you would have to try to create a container that does not have this issue. I believe the issue is easy enough to trigger for any small test run.

alneberg avatar Dec 13 '23 12:12 alneberg

Dear Both - thank you so much for your time to respond. I will probably skip the concoct step for now as suggested as the current data is rather large and test with smaller data set and report back.

I have been using core-nf/mag pipeline previous without the concoct step and it has worked really well - thanks so much for all your efforts in developing the pipeline they are very much appreciated :)

Peter-Kille avatar Dec 13 '23 13:12 Peter-Kille

Should b efixed here!

https://github.com/nf-core/mag/pull/631

jfy133 avatar Jun 27 '24 13:06 jfy133