xcms icon indicating copy to clipboard operation
xcms copied to clipboard

Re: a small issue with mzR or XCMS packge when sourcing code with foreach and doParallel

Open oliverververver opened this issue 3 years ago • 4 comments

Hi Neumann,

I have been a loyal user of XCMS to facilitate data processing in R. It has been extremely helpful ever since I started using it. Recently, I came across a small issue. Whenever I try to source the code with foreach parallel function using XCMS package, it gives me an error message "Error in { : task 1 failed - "inherits(x, "mzR") is not TRUE" ". But when I clear the environment and source the code again, the error disappears. But next time when I run the same code with a new dataset, the error message pops up again. The parallel processing code I use is like this "datalist <- foreach(q = (1:(length(filename))), .packages = c("xcms", "MSnbase", "dplyr", "mzR")) %dopar% {}". I am not too if this has anything to do with the mzR package. I wrote library(mzR) in the very beginning in case foreach could not fetch it, but the error still occurs to me whenever I source the code on a new dataset. Plus, there is no error when I run the code line by line. It only happens when I source the whole code. What do you think is the issue about and how to fix it? Many thanks.

Sincerely,

Jian Code

oliverververver avatar Jun 10 '21 21:06 oliverververver

Hi, thanks for reporting. Could you provide a reproducible example, possibly using the faaKO data ? The screenshot is not cut&pasteable :-) Since the xcms class is based on the MSnbase class which uses the mzR functions, it would be great to reduce it to the minimum reproducible example, i.e. make sure xcms or MSnbase are innocent. That helps nailling it down. We could then also go back in time and check the behaviour of older BioC environments. Yours, Steffen

sneumann avatar Jun 11 '21 07:06 sneumann

Could you maybe also provide some information on the system you're running this code (i.e. which OS you use - parallel processing is different on Windows and Unix machines). Ideally, provide the output of your sessionInfo() so we can also check the xcms package version you're using.

jorainer avatar Jun 14 '21 12:06 jorainer

Hi Steffen and Jorainer,

This is the version information for the packages I used. R version 4.0.4 (2021-02-15) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows >= 8 x64 (build 9200) xcms version 3.10.2; MSnbase version 2.14.2 I have tried the code using the faahKO dataset. Interstingly, although I made a few changes so that the code could read cdfs files, there was no such error that occured. When the code read mzXML files, this error occurred saying "Error in { : task 1 failed - "inherits(x, "mzR") is not TRUE"". But when I run the code the second time, there was no such error anymore. So I basically had to run the code twice to get the results. When I run the code line by line, there is no problem. BTW, these are all the packages I used for this code: library(xcms) library(doParallel) library(foreach) library(MSnbase) library(dplyr) library(ggplot2) library(gridExtra)

Best regards,

Jian

oliverververver avatar Jun 14 '21 17:06 oliverververver

The error message you're getting can also mean that the file can not be found. I would suggest the following: inside the loop (i.e. for each parallel process):

  • check that the file is there with stopifnot(file.exists(filename[q]))
  • make sure the packages are available inside the loop with library(xcms). I guess that should not be necessary and you might also drop that later - just to be really sure that xcms/mzR etc are loaded.
  • maybe you could also print out the name of the file in every loop? something like message(basename(filename[q]))? I guess you could also get the same error if you have files other than mzML, mzXML and CDF files in your filename variable...

Generally, error reporting in parallel processing is really poor in R.

jorainer avatar Jun 16 '21 10:06 jorainer