xcms
xcms copied to clipboard
Different amount of peaks with old and new interface?
Hi XCMS team,
I finally got around trying the and improved XCMS3 interface :-) One thing I quickly noticed is that with some data files I get significantly different amounts of peaks. A small example is shown below.
library(xcms)
# remotes::install_github("rickhelmus/patRoonData")
files <- list.files(patRoonData::exampleDataPath(), full.names = TRUE)
xs <- xcmsSet(files, method = "centWave")
MSData <- MSnbase::readMSData(files, msLevel. = 1L, mode = "onDisk")
xdata <- findChromPeaks(MSData, CentWaveParam())
xdata2 <- findChromPeaks(MSData, CentWaveParam(), return.type = "xcmsSet")
The old xcmsSet
function returns less than half of the peaks compared to the new interface (401 vs 911). As far as I could tell peak picking defaults remained the same, perhaps it's related to the different binning functions? In the end I am just curious if this is a bug or feature? :-)
Outputs:
> xs
An "xcmsSet" object with 6 samples
Time range: 14-599.8 seconds (0.2-10 minutes)
Mass range: 84.9595-299.1494 m/z
Peaks: 401 (about 67 per sample)
Peak Groups: 0
Sample classes: extdata
Feature detection:
o Peak picking performed on MS1.
Profile settings: method = bin
step = 0.1
Memory usage: 0.251 MB
> xdata
MSn experiment data ("XCMSnExp")
Object size in memory: 3.88 Mb
- - - Spectra data - - -
MS level(s): 1
Number of spectra: 14167
MSn retention times: 0:0 - 9:60 minutes
- - - Processing information - - -
Data loaded [Fri Aug 16 11:56:53 2019]
Filter: select MS level(s) 1 [Fri Aug 16 11:56:53 2019]
MSnbase version: 2.10.1
- - - Meta data - - -
phenoData
rowNames: solvent-1.mzML solvent-2.mzML ... standard-3.mzML (6 total)
varLabels: sampleNames
varMetadata: labelDescription
Loaded from:
[1] solvent-1.mzML... [6] standard-3.mzML
Use 'fileNames(.)' to see all files.
protocolData: none
featureData
featureNames: F1.S0001 F1.S0002 ... F6.S2363 (14167 total)
fvarLabels: fileIdx spIdx ... spectrum (33 total)
fvarMetadata: labelDescription
experimentData: use 'experimentData(object)'
- - - xcms preprocessing - - -
Chromatographic peak detection:
method: centWave
911 peaks identified in 6 samples.
On average 152 chromatographic peaks per sample.
> xdata2
An "xcmsSet" object with 6 samples
Time range: 11.1-599.8 seconds (0.2-10 minutes)
Mass range: 91.0541-299.1494 m/z
Peaks: 911 (about 152 per sample)
Peak Groups: 0
Sample classes: solvent-1.mzML, solvent-2.mzML, solvent-3.mzML, standard-1.mzML, standard-2.mzML, standard-3.mzML
Feature detection:
o Peak picking performed on MS1.
o Scan range limited to 1 - 2667
Profile settings: method = bin
step = 0.1
Memory usage: 2.31 MB
I am using R 3.6.1 with fully updated Bioc/R packages.
Cheers, Rick
That is indeed unexpected! I will have a look into it.
Hi, I was able to reproduce the numbers in the example. I didn't find an explanation. Yours, Steffen
Getting closer to the explanation: reading just one file and comparing the content of the data. The xcmsSet
call reads each file with xcmsRaw
and performs peak detection on that.
xr <- xcmsRaw(files[1])
msd1 <- readMSData(files[1], msLevel = 1L, mode = "onDisk")
## Comparing the number of spectra:
length(xr@scantime)
2358
length(rtime(msd1))
2629
Apparently, the xcmsRaw
has fewer scans/spectra than what is available when reading the data with readMSData
. In fact, xcmsRaw
has an option dropEmptyScans = TRUE
which removes spectra without any peaks (because this can not be represented by the xcmsRaw
data structure).
To verify that the difference in spectra is due to that:
length(msd1) - sum(peaksCount(msd1) == 0)
2358
I will next check how this affected the peak detection, but I guess the explanation might be in this difference.
In fact, when the empty spectra are removed from the new object the results of centWave are identical:
tmp <- msd1[peaksCount(msd1) > 0]
res_new <- findChromPeaks(tmp, CentWaveParam())
res_old <- findPeaks.centWave(xr)
all.equal(res_old[, "mz"], unname(chromPeaks(res_new)[, "mz"]))
TRUE
all.equal(res_old[, "into"], unname(chromPeaks(res_new)[, "into"]))
TRUE