xcms icon indicating copy to clipboard operation
xcms copied to clipboard

Does XCMS work for Waters MSe data?

Open QizhiSu opened this issue 5 years ago • 11 comments

Hello everyone, I am recently looking for tools to process my data (MSe) obtained by Waters UPLC-QTOF-MS. My interest focuses on picking up features from the low energy function, and then assign MSe spectrum. This way, I can export both precursor and fragmentation information for every compound to other software for identification, e.g. Sirius. Does anyone know if XCMS is capable to deal with this issue??

Many thanks.

QizhiSu avatar Nov 21 '19 13:11 QizhiSu

I'm not at all familiar with MSe data, so I would need some more info on that to be able to answer. So you have an MSe mzML file, does this contain MS1 level data? Or do you want to use the spectra with low energy function as sort of MS1 signal and do the chromatographic peak detection on that and then find spectra (with higher energy function) for each chrom peak to use as MS/MS for identification? The first part (peak detection on low energy function spectra should be relatively simple - the second one will be tricky, because I guess you don't have a precursor m/z for each spectrum?

jorainer avatar Nov 28 '19 07:11 jorainer

I would guess it looks like a SWATH file, just with one big isolation window. It should contain alternating MS1 and MS2 scans. The deconvolution will of the MS2 spectra will be tricky because there are many overlapping peaks then in the MS2 scan. In prinicipal the SWATH workflow should do the job, but how good is the question.

michaelwitting avatar Nov 28 '19 07:11 michaelwitting

I am awfully sorry for the late reply. Yes, MSe is like what michaelwitting has commented. It fragments all ions in MS1. It is true that it is tricky to correspond MS2 spectra to MS1 features. Considering there are in-source fragmentations in MS1 as well, I think it is also important to group ions that coming from MS1 into single compounds, otherwise those in-source fragments will be considered as "compounds" as well.

QizhiSu avatar Dec 20 '19 11:12 QizhiSu

Update: have a look at this issue https://github.com/sneumann/xcms/issues/451 It should be possible to analyze also MSe data, you can do first classical detection in MS1 and then run a second round of peak detection on MS2 with findChromPeaks(..., msLevel = 2, add = TRUE). The add = TRUE ensures that newly identified peaks will be added to the already identified MS1 peas.

jorainer avatar Apr 10 '20 05:04 jorainer

Hi, I'm piggybacking on this issue since it is not as dense is issue #451 Having a hard time tracking the back and forth between @jorainer and @cbroeckl to figure out at which step, the mse data should be incorporated into the workflow. I'm working with a small subset of mse data from here, converted from .raw to mzML with msconvert options --filter "msLevel 1-"
--filter "lockmassRefiner mz=556.2771 mzNegIons=554.2615 tol=1.5". Interestingly, while the feature count drops, I still have spectra at evenly spaced intervals as in the second figure on issue #470 Here is my code. Does add = TRUE not work because I'm not using the jomaster branch?: setwd("M:/Metabolomics_Training_2020/msconvert_function_lockmass_batch") # Set working directory data <- dir("data", pattern = ".mzML", full.names = T, recursive = F) # R list of data files phenodata <- data.frame(sample_name = sub(basename(data), pattern = ".mzML", replacement = "", fixed = T), sample_group = c(rep("PF-C", 6), rep("PF-T", 6)), stringsAsFactors = F) # R dataframe listing the treatment associated to each data file in a separate column library(xcms) # Load xcms raw_data <- readMSData(files = data, pdata = new("NAnnotatedDataFrame", phenodata), mode = "onDisk") # Load data ms.res <- 30000; peak.width <- c(3,30); sn <- 5; ms.fwhm <- round(550/ms.res, digits = 3); ppm <- round(1.5*(1000000*ms.fwhm)/(550*2.355)); cwp <- CentWaveParam(peakwidth = peak.width, ppm = ppm, snthresh = sn, mzdiff = ms.fwhm, fitgauss = TRUE, verboseColumns = TRUE) # CentWave parameters xdata1 <- findChromPeaks(raw_data, param = cwp, msLevel = 1L) # Load MS1 data. Receive Error in x$.self$finalize() : attempt to apply non-function but peak finding progresses. sample.groups <- c(rep(1, length([email protected][[1]]@fileIndex)/2), rep(2, length([email protected][[1]]@fileIndex)/2)) # A list of treatment indices for the data files for peak grouping minfrac = 0.4; bw.pre = 3; pdp_pre <- PeakDensityParam(sampleGroups = sample.groups, minFraction = minfrac, bw = bw.pre) # Grouping parameters prior to retention time correction xdata1 <- groupChromPeaks(xdata1, param = pdp_pre) # XCMS feature grouping, pre-RT correction pgp <- PeakGroupsParam(minFraction = max(0.5, minfrac)) # Retention time correction parameters xdata1 <- adjustRtime(xdata1, param = pgp) # XCMS RT correction bw.post <- 1.5; pdp_post <- PeakDensityParam(sampleGroups = sample.groups, minFraction = minfrac, bw = bw.post) # Post RT correction grouping parameters xdata1 <- groupChromPeaks(xdata1, param = pdp_post) # XCMS group after RT correction fpp <- FillChromPeaksParam(expandMz = 0, expandRt = 0, ppm = 0) # Peak filling parameters xdata1 <- fillChromPeaks(xdata1, param = fpp) #XCMS fillPeaks. I get a similar error to this for each file: Requesting 1412 peaks from PF-T_201247903_0060.mzML ... Error in (function (x) : attempt to apply non-function got 1350. xdata1 <- findChromPeaks(raw_data, param = cwp, msLevel = 2L, add = TRUE) # Attempt to add MSE data but get Error in .local(object, param, ...) : unused argument (add = TRUE)

Phylloxera avatar May 05 '20 11:05 Phylloxera

@Phylloxera @Sukis123 The approach I am using (now) is below on windows. The point at which you want to incorporate MSe files as at the very first step.

readMSData(files = files) # this vector of file paths/names should contain both MS1 and MS2(MSe) files. In this workflow, MS1 and MS2 data are in separate mzML files, each of which has been lockmass corrected by pwiz.

mcpar <- SnowParam(workers = 4, type = "SOCK")
 raw_data <- readMSData(files = files, 
                         pdata = new("NAnnotatedDataFrame", data.frame(sample.names)),
                         mode = "onDisk"
   )

 orig.msLevel <- raw_data@featureData@data$msLevel
   raw_data@featureData@data$msLevel <- rep(1, length(orig.msLevel))

 cwp <- CentWaveParam(peakwidth = peak.width, 
                         ppm = ppm,
                         snthresh = 5,
                         mzdiff = ms.fwhm,
                         fitgauss = TRUE,
                         verboseColumns = TRUE)
  xdata <- findChromPeaks(raw_data, param = cwp, msLevel = 1, BPPARAM = mcpar)

  sample.groups <- c(
      rep(1, length([email protected][[1]]@fileIndex)/2),
      rep(2, length([email protected][[1]]@fileIndex)/2)
    )
  pdp <- PeakDensityParam(sampleGroups = sample.groups,
                          minFraction = minfrac, bw = bw.pre)
  xdata <- groupChromPeaks(xdata, param = pdp)

  pgp <- PeakGroupsParam(
    minFraction = max(0.5, minfrac)
  )
  xdata <- adjustRtime(xdata, param = pgp) 

  pdp <- PeakDensityParam(sampleGroups = sample.groups,
                          minFraction = minfrac, bw = bw.post)
  xdata <- groupChromPeaks(xdata, param = pdp) 
  
  fpp <- FillChromPeaksParam(expandMz = 0, expandRt = 0, ppm = 0)
  xdata <- fillChromPeaks(xdata, param = fpp, BPPARAM = mcpar)  

  xdata@featureData@data$msLevel <- orig.msLevel

Note that this is a temporary workaround until I actually test the new functionality in @jorainer add chromPeaks. I haven't adapted to that yet. I would anticipate that this script should work:

readMSData(files = files) # this vector of file paths/names should a list of mzML files which have BOTH MS1 and MS2 data in one file. I

mcpar <- SnowParam(workers = 4, type = "SOCK")
 raw_data <- readMSData(files = files, 
                         pdata = new("NAnnotatedDataFrame", data.frame(sample.names)),
                         mode = "onDisk"
   )

 cwp <- CentWaveParam(peakwidth = peak.width, 
                         ppm = ppm,
                         snthresh = 5,
                         mzdiff = ms.fwhm,
                         fitgauss = TRUE,
                         verboseColumns = TRUE)
  xdata <- findChromPeaks(raw_data, param = cwp, msLevel = 1, BPPARAM = mcpar)
  xdata <- findChromPeaks(xdata, msLevel = 2, add = TRUE

  pdp <- PeakDensityParam(sampleGroups = sample.groups,
                          minFraction = minfrac, bw = bw.pre)
  xdata <- groupChromPeaks(xdata, param = pdp)

  pgp <- PeakGroupsParam(
    minFraction = 0.5
  )
  xdata <- adjustRtime(xdata, param = pgp) 

  pdp <- PeakDensityParam(sampleGroups = sample.groups,
                          minFraction = minfrac, bw = bw.post)
  xdata <- groupChromPeaks(xdata, param = pdp) 
  
  fpp <- FillChromPeaksParam(expandMz = 0, expandRt = 0, ppm = 0)
  xdata <- fillChromPeaks(xdata, param = fpp, BPPARAM = mcpar)  

cbroeckl avatar May 05 '20 18:05 cbroeckl

Thanks @cbroeckl ! The current approach ran to completion despite this warning at adjustRtime:

#Warning message: #Adjusted retention times had to be re-adjusted for some files to ensure them being in the same order than the raw retention times. A call to 'dropAdjustedRtime' might thus fail to restore retention times of chromatographic peaks to their original values. Eventually consider to increase the value of the 'span' parameter.

and errors/warnings at fillChromPeaks for 4 of the 6 .ms1 file like this:

#Requesting 1412 peaks from 20130906_RempelSowLactation_0060.ms1.mzML ... Error in (function (x) : attempt to apply non-function #got 1210. #Warning messages: 1: In serialize(data, node$con) : 'package:stats' may not be available when loading

For the files containing both ms1 and ms2, it stops with the error at findChromPeaks(...msLevel = 2...):

#Error in (function (classes, fdef, mtable) : #unable to find an inherited method for function 'findChromPeaks' for signature '"XCMSnExp", "missing"'

Provisionally guessing this is because I'm not using the jomaster branch.

Is xdata <- findChromPeaks(xdata, msLevel = 2, add = TRUE) right? I was thinking it should be xdata <- findChromPeaks(raw_data, msLevel = 2, add = TRUE)

Phylloxera avatar May 06 '20 20:05 Phylloxera

I also got

Error in asMethod(object) : Coercing an XCMSnExp with MS level > 1 is not yet supported!

when attempting to convert the xdata from the separate files method to xset for import to RAMClustR with xset <- as(xdata, "xcmsSet")

Phylloxera avatar May 06 '20 20:05 Phylloxera

you should not need to coerce the xcms structure for ramclustR - it should take the new format as input directly

and yes - I think you will need the jomaster branch for the new workflow.

cbroeckl avatar May 06 '20 20:05 cbroeckl

Thanks @cbroeckl , RC <- ramclustR(xcmsObj = xdata, ExpDes=experiment)

Error in if (!is.null(xcmsObj) & mslev == 2 & any(is.null(MStag), is.null(idMSMStag), : missing value where TRUE/FALSE needed In addition: Warning message: In ramclustR(xcmsObj = xdata, ExpDes = experiment) : NAs introduced by coercion

Is this because I filled in something incorrectly at experiment <- defineExperiment(csv = FALSE)? Should I move over to github.com/cbroeckl/RAMClustR for this follow-up?

Phylloxera avatar May 08 '20 15:05 Phylloxera

lets do move over to rhe ramclustR GitHub for this question. thanks. C

cbroeckl avatar May 08 '20 15:05 cbroeckl