xcms icon indicating copy to clipboard operation
xcms copied to clipboard

how to run XCMS with multithreads

Open YiqingElroy opened this issue 2 years ago • 4 comments

Recently, I try to run xcms (Rcalled by MZmine) in the slurm cluster. The technician said only one CPU core was used by R (xcms). How can I set up to make XCMS run with multithreads?

Thank you!

YiqingElroy avatar Dec 14 '21 14:12 YiqingElroy

Hi, under the hood xcms uses https://bioconductor.org/packages/release/bioc/html/BiocParallel.html for parallelisation. There are several backends, including slurm. Then xcms creates one job for each input file, and passes that to the backend. There is no parallelisation within one input file. Yours, Steffen

sneumann avatar Dec 14 '21 16:12 sneumann

I'm also running our preprocessings using slurm on our cluster - but I'm defining one Rmd file that defines the whole analysis. So I submit the rmarkdown::render("<filename>.Rmd") as a job to the cluster, or better said I submit a shell script with the following content as a job to the cluster:

#!/bin/bash

PTH=`pwd`
/shared/bioinf/R/bin/R-4.0-BioC3.12 -e 'rmarkdown::render(\"peak_detection.Rmd\")'

Within the Rmd file (which can obviously also be a simple .R script) I use then the following setup for parallel processing:

library(xcms)
ncores <- as.integer(Sys.getenv("SLURM_JOB_CPUS_PER_NODE", 5))
register(MulticoreParam(ncores - 1L))

The second line gets the number of CPUs that are defined for the slurm job (in the call you make to send the job to the queue). After that, xcms uses ncores -1 CPUs by default for parallel processing per-file (i.e. if you have only one file there will not be any parallel processing, if you're processing two files, the jobs will be distributed to two CPUs etc).

jorainer avatar Dec 15 '21 07:12 jorainer

Thanks for the answer. I integrate the submit script. Will it be okay to write in the submit script like the following?:

#!/bin/bash

cd $SLURM_SUBMIT_DIR

module load R/4.1.0-foss-2019b

export R_HOME=/apps/eb/R/4.1.0-foss-2019b # edit R_HOME address

export OMP_NUM_THREADS=6

PTH=pwd /shared/bioinf/R/bin/R-4.0-BioC3.12 -e 'rmarkdown::render("peak_detection.Rmd")'

time ./startMZmine-Linux ~/MZmine_Test/GTP18_test1.xml

YiqingElroy avatar Dec 15 '21 21:12 YiqingElroy

You should not just copy-paste the lines from my script. I am calling R and then the render function to process the .Rmd file (which is a file containing R commands). In your case you don't have that. I don't know how MZmine is calling R. Maybe you need to contact the MZmine developers to ask them. xcms will by default use the parallel processing which is set up using e.g. register(MulticoreParam(<num CPUs>)). In your case it depends on how MZmine calls/starts R and what settings it uses (i.e. with which calls it is calling xcms) - and you will get this information only from the MZmine developers.

jorainer avatar Dec 16 '21 06:12 jorainer