rnaseq icon indicating copy to clipboard operation
rnaseq copied to clipboard

DESEQ2 using more cores (29) than requested (6)

Open jambler24 opened this issue 3 years ago • 2 comments

Description of the bug

We encountered this issue running the pipeline on our Slurm cluster. They have rather strict policies in place that result in jobs being automatically terminated if the job uses more cores than requested.

The process in question has the "medium" label, and should just be using 6 cores according to that definition.

Any ideas why the actual usage is greater?

Command used and terminal output

Your processes running on srvcnthpc126 were cancelled as they were using more cores (29) than requested (6). Please can you check your submission script and executable against the cores that you are reserving for your jobs.
If the issue persists, especially if your processes use java, please increase --ntasks to 40.
Running processses:
user_x + 209742 209724  0 16:18 ?        00:00:00 /bin/bash -c cd /scratch/user_x/RNA-seq2021/Analysis/work/a3/2a1e91f92139ef7364417000e8224b; eval export PYTHONNOUSERSITE="1" export R_PROFILE_USER="/.Rprofile" export R_ENVIRON_USER="/.Renviron" export PATH="/home/user_x/.nextflow/assets/nf-core/rnaseq/bin:$PATH"; /bin/bash /scratch/user_x/RNA-seq2021/Analysis/work/a3/2a1e91f92139ef7364417000e8224b/.command.run nxf_trace
user_x + 209769 209742  0 16:18 ?        00:00:00 /bin/bash /scratch/user_x/RNA-seq2021/Analysis/work/a3/2a1e91f92139ef7364417000e8224b/.command.run nxf_trace
user_x + 209786 209769  0 16:18 ?        00:00:00 /bin/bash -ue /scratch/user_x/RNA-seq2021/Analysis/work/a3/2a1e91f92139ef7364417000e8224b/.command.sh
user_x + 209788 209769  0 16:18 ?        00:00:02 /bin/bash /scratch/user_x/RNA-seq2021/Analysis/work/a3/2a1e91f92139ef7364417000e8224b/.command.run nxf_trace
user_x + 209794 209786 99 16:18 ?        00:38:06 /usr/local/lib/R/bin/exec/R --no-echo --no-restore --file=/home/user_x/.nextflow/assets/nf-core/rnaseq/bin/deseq2_qc.r --args --count_file salmon.merged.gene_counts_length_scaled.tsv --outdir ./ --cores 6 --id_col 1 --sample_suffix  --outprefix deseq2 --count_col 3

Relevant files

nextflow.log

System information

No response

jambler24 avatar Jan 18 '22 16:01 jambler24

Hi @jambler24 ! Apologies for the late response!

I have had a look at the deseq2_qc.r script and it appears like we have defined a parameter for --cores but we aren't actually using it anywhere in the script. This means that we currently don't have a way to throttle the number of cores used by the script which explains your issue. I am also not certain as to which step(s) in the script are over-requesting the CPUs without the CPUs being explicitly specified.

@macroscian any ideas on how we can troubleshoot and fix this?

drpatelh avatar Feb 14 '22 18:02 drpatelh

Tried to go down this rabbit hole just now but I couldn't find which bit of the script is requesting all of those CPUs in order to change the behaviour.

drpatelh avatar Apr 26 '22 15:04 drpatelh

Don't know how to debug this I'm afraid. If you are able to provide an example of how I can or even better a fix via a PR that would be awesome! Closing for now but feel free to re-open if the problem persists.

drpatelh avatar Dec 16 '22 11:12 drpatelh