xcms icon indicating copy to clipboard operation
xcms copied to clipboard

How to save and read each intermediate result?

Open FangYangUW opened this issue 6 years ago • 3 comments

Howdy,

I want to run xcms on 255 samples. When I follow the pipeline “LCMS data preprocessing and analysis with xcms”, I have questions about how to save and read each intermediate result. For example, it takes a while to read 255 raw data using readMSData(). How can I save the intermediate result generating by readMSData() and read the intermediate result to move to chromatogram(). How can I save and read intermediate result generating by chromatogram(), findChromPeaks(), CentWaveParam(), chromPeaks(), adjustRtime(), dropAdjustedRtime(), PeakDensityParam(), groupChromPeaks(), PeakGroupsParam(), adjustRtime(). The link for the pipeline is as below. https://bioconductor.org/packages/devel/bioc/vignettes/xcms/inst/doc/xcms.html#4_chromatographic_peak_detection

Thanks a lot for your help in advance.

Fang

FangYangUW avatar Jul 08 '19 15:07 FangYangUW

Hi Fang,

assuming you are reading the raw data with the mode = "onDisk", you can save the OnDiskMSnExp after the readMSData call, but very importantly, you should ensure to not move or delete the raw mzML files at any stage during the analysis, as all result objects always require the original files to retrieve m/z and intensity values.

I'd suggest to save the objects always with the save function e.g. with save(data, file = "<file name>") which will save the object data to the file <"file name>". You can then load this object again into the R workspace with load("<file name>").

Saving the XCMSnExp result object after a findChromPeaks call makes perfectly sense, since this is the step that takes most time. Also you can save the result object after adjustRtime or groupChromPeaks, but these calls are usually less computational intense.

For all other mentioned function there is no need to save the results from their output.

jorainer avatar Jul 09 '19 09:07 jorainer

Thanks a lot for your reply. Do I need to save each important intermediate result? The last saved data doesn't include the previous intermediate results. For example, I run the following command.

library(xcms) library(RColorBrewer) library(pander) library(magrittr) library(pheatmap)

t5files <- list.files("./T5Data", recursive=T, full=T) pd <- data.frame(sample_name = sub(basename(t5files), pattern = ".mzXML", replacement = "", fixed = TRUE), sample_group = c(rep("HFD_KO_T5", 8), rep("HFD_WT_T5", 8), rep("Blank", 1), rep("QC", 1)), stringsAsFactors = FALSE) raw_data <- readMSData(files = t5files, pdata = new("NAnnotatedDataFrame", pd), mode = "onDisk") bpis <- chromatogram(raw_data, aggregationFun = "max") save(bpis, file = "data1.RData")

Then I terminate the terminal and open the terminal again to load the data I saved. When I move on to the next step, I am told that the object 'raw_data' is not found. load("data1.RData") rtr <- c(0.68, 0.94) mzr <- c(133, 135) chr_raw <- chromatogram(raw_data, mz = mzr, rt = rtr) Error in chromatogram(raw_data, mz = mzr, rt = rtr) : object 'raw_data' not found

FangYangUW avatar Jul 09 '19 17:07 FangYangUW

Is there any specific reasons you want to close the R session and re-open it? Usually you don't need to do that. I would only save the data after some computation intense steps, like the data object after the findChromPeaks. Also, you might want to load all libraries again after you start the R session again.

your code above fails, because you are saving the bpis object, but not the raw_data that is used in the second code block. If you save raw_data instead of bpis it would work. Also, have a look at the help of the save function (?save). It is also possible to save the full environment with all objects at any step, and load all again with load.

jorainer avatar Jul 09 '19 17:07 jorainer