xcms icon indicating copy to clipboard operation
xcms copied to clipboard

Missing peak after grouping

Open plyush1993 opened this issue 10 months ago • 19 comments

Hi, I observe a huge difference between output of peak grouping / peak filling and featureValues, however I set minFraction to 0.1 and minSample = 1, and each sample represents a unique group in my case (sampleGroups vector is also a vector of unique numeric values). So I manually found one interesting peak in one particular sample in the peak grouping object but it didn't occur in the feature table after the featureValues function.

In addition, I noticed that this peak is recognized at ms2 level, however I can plot peak with both xcms tools and vendor software.

Regards, Ivan

plyush1993 avatar Jan 25 '25 23:01 plyush1993

Hi, thanks for reaching out. Could you be a bit more specific ? Possibly giving a reproducible example, and/or more description. "huge difference" in what ? Number of features ? Or the intensities of peaks detected by, e.g. centWave, and those imputed by using fillPeaks() ? Can you show the interesting peak mentioned ? Yours, Steffen

sneumann avatar Jan 27 '25 09:01 sneumann

Dear @sneumann thank you for your reply!

I attached mzXML files (derived from TIMS-TOF Pro2 operated in AutoMSMS mode (DDA) and converted in MSConvert for MS levels 1-2): reproducible example.zip The mentioned peak has mz 259.28 and rt 295-300. So I can find it in pk_gr (after grouping) and plot by xcms::chromatogram and observe in vendor software, however, it isn't present in the final peak table. Below is a code to reproduce the error:

library(xcms)
library(parallel)
library(doParallel)
library(BiocParallel)

setwd("C:/.../reproducible.example") # main folder
wd_1 <- c("C:/.../reproducible example/") # folder with mzXML files 
files_all <- list.files(wd_1, recursive = T, full.names = T, pattern = ".mzXML") 

pd <- data.frame(sample_name = sub(basename(files_all), pattern = ".mzML",
                                   replacement = "", fixed = TRUE), stringsAsFactors = FALSE) # download filenames

raw_data <- MSnbase::readMSData(files = files_all, pdata = new("NAnnotatedDataFrame", pd), mode = "onDisk") # or use pd only as: pdata = new("NAnnotatedDataFrame", pd) or pdata = new("NAnnotatedDataFrame", n_gr_t)

cores = detectCores()-1
register(bpstart(SnowParam(cores)))
BiocParallel::register(BiocParallel::SerialParam())

cwp <- xcms::CentWaveParam(ppm = 10, 
                           peakwidth = c(5, 50), 
                           snthresh = 10, 
                           prefilter = c(3, 300),
                           noise = 1000) 

feat_det <- xcms::findChromPeaks(raw_data, param = cwp)

data_merge <- refineChromPeaks(feat_det, MergeNeighboringPeaksParam(minProp = 0.75, expandRt = 5, ppm = 5)) 

BiocParallel::register(BiocParallel::SerialParam())
app <- xcms::ObiwarpParam(
  center = 2)

ret_cor <- xcms::adjustRtime(data_merge, param = app)

pgp <- xcms::PeakDensityParam(sampleGroups = c(1:length(files_all)), 
                              bw = 1, 
                              minFraction =  0, 
                              minSamples = 0
) 

pk_gr <- xcms::groupChromPeaks(ret_cor, param = pgp)
ft_tbl <- featureValues(pk_gr, value = "into")
ft_inf <- featureDefinitions(pk_gr)
det_mz = data.frame(ft_inf@listData[["mzmed"]])

plotChromPeaks(pk_gr, ylim = c(459.2700, 459.2900), xlim = c(200, 400), file = 2)
mzr_1 <- 459.28 + c(-0.005, 0.005)
chr_1 <- xcms::chromatogram(pk_gr, mz = mzr_1)
plot(chr_1)

plyush1993 avatar Jan 27 '25 10:01 plyush1993

What version of R and xcms are you using? I tried to run your example code above with the current stable version of xcms (4.4.0) but can not find the peak you mention. The plot(chr_1) generates the plot below, which shows signal in MS1, but no identified chromatographic peak (which would be shaded with a grey background). In fact, it looks like you have just 3 data points - and doing peak detection with so few data points is difficult...

Image

maybe worth mentioning that there have been fixes to the chromatogram() function to ensure that signal is only extracted for the MS level of interest. So, the default is chromatogram(msLevel = 1), while it would also be possible to call chromatogram(msLevel = 2) to extract the MS 2 data.

jorainer avatar Jan 27 '25 13:01 jorainer

Thank you @jorainer ! I forgot to mention that I usually use 3.17.3, but I also tried on 4.4.0. Anyway, the problem seems to be associated with the number of data points. But in general, if the peak is in the feature_grouping object, what is the reason why it is not included in the final peak table (output from featureValues function)? Is it possible to control it somehow?

plyush1993 avatar Jan 27 '25 16:01 plyush1993

With feature_grouping object - do you mean the xcms result object after correspondence? any LC-MS feature you have in the featureDefinitions() of that object will be (and has to be!) in the matrix returned by the featureValues() call. it can well be that some chromatographic peaks from the chromPeaks() matrix are not included/considered (i.e. if they are not part of one feature).

jorainer avatar Jan 29 '25 11:01 jorainer

I meant if feature is in the groupChromPeaks object, why it is not translated into peak table by featureDefinitions or featureValues? Let's say I observed a feature in groupChromPeaks object but in final peak there is no peak with similar m/z and or rt.

plyush1993 avatar Jan 29 '25 20:01 plyush1993

Now, as an example (maybe it's clearer that way): if you run e.g.

xcms_res <- groupChromPeaks(xcms_res, param = PeakDensityParam(sampleGroups = sampleData(xcms_res)$sample_group))

this will group the chromatographic peaks in chromPeaks(xcms_res) to LC-MS features depending on the settings defined with PeakDensityParam. After this correspondence analysis, the definition of the features (i.e. the information which chrom peaks are grouped together) is added to the result object xcms_res and can be extracted with featureDefinitions(xcms_res). Now, a call to featureValues(xcms_res) will extract numerical matrix with the values (abundances) for the features, each row one feature, each column one sample/data file. The matrix of featureValues(xcms_res) is generated on-the-fly based on the information in featureDefinitions(xcms_res). And all features defined in featureDefinitions(xcms_res) will be in featuresValues(xcms_res). So nrow(featureValues(xcms_res)) will always be the same as nrow(featureDefinitions(xcms_res)). Do you see any differences (i.e. different row numbers) between these two tables?

jorainer avatar Feb 11 '25 07:02 jorainer

Thank you @jorainer ! There is no difference between nrow(featureValues(xcms_res)) and nrow(featureDefinitions(xcms_res)).

plyush1993 avatar Feb 11 '25 08:02 plyush1993

Dear @sneumann @jorainer Sorry for bothering you again

I am working with TIMS-TOF Bruker instrument without activating Ion Mobility and in AutoMSMS DDA acquisition mode. I faced with the following issue: I am searching for the untargeted peak integration and already know that one of the interesting peak has an m/z value 459.28 and rt 300 s. I found it in the object after running findChromPeaks in both level MS1 (base peak mz) and 2 (precursor mz).

Image

I can also plot this peak by xcms::chromatogram Image

The issue is that I don't get it in the final peak table (after running featureValues) R version 4.4.2; xcms version 4.4.0 I already tried several ways for converting: after ProteoWizard MSConvert (both mzML and mzXML) I get only MS2 level peaks in the object after running findChromPeaks, so I decided to use dedicated software TIMSCONVERT

Here is the files for testing (obtained by TIMSCONVERT): TIMSCONVERT.zip

Below is the code for reproducing:

library(xcms)
library(data.table)
library(dplyr)
library(stringr)
library(BiocParallel)
library(doParallel)
library(MsExperiment)

wd_1 <- c("C:/...TIMSCONVERT/") 
files <- list.files(wd_1, recursive = F, full.names = T, pattern = ".mzML") 

pd <- data.frame(sample_name = sub(basename(files), pattern = ".mzML",
                                   replacement = "", fixed = TRUE), stringsAsFactors = FALSE) 

rname <- pd
all_id <- as.data.frame(sapply(1:nrow(rname), function(y) unlist(str_split(rname[y,], " "))))
all_id <- as.data.frame(t(all_id))
all_id1 <- as.character(all_id)
n_gr_t <- data.frame(all_id1)

vec_gr <- as.numeric(as.factor(n_gr_t$all_id1))
sample_gr <- unique(as.numeric(as.factor(n_gr_t$all_id1)))
n_gr <- sapply(1:length(sample_gr), function(y) length(vec_gr[vec_gr == y]))
min_frac_man <- 0.01 

raw_data <- MSnbase::readMSData(files = files, pdata = new("NAnnotatedDataFrame", n_gr_t), mode = "onDisk") 

cores = detectCores()-1
register(bpstart(SnowParam(cores)))
BiocParallel::register(BiocParallel::SerialParam())


cwp <- xcms::CentWaveParam(ppm = 10, 
                           peakwidth = c(5, 50), 
                           snthresh = 10, 
                           prefilter = c(3, 300), 
                           noise = 1000) 

feat_det <- xcms::findChromPeaks(raw_data, param = cwp)

BiocParallel::register(BiocParallel::SerialParam())
app <- xcms::ObiwarpParam(center = 2)

ret_cor <- xcms::adjustRtime(feat_det, param = app)

pgp <- xcms::PeakDensityParam(sampleGroups = as.numeric(as.factor(n_gr_t$all_id1)), 
                              bw = 1, 
                              minFraction =  0.01,
                              minSamples = 1) 

pk_gr <- xcms::groupChromPeaks(ret_cor, param = pgp)

ft_tbl <- featureValues(pk_gr, value = "into")
ft_inf <- featureDefinitions(pk_gr)
plotChromPeaks(pk_gr, ylim = c(459.2700, 459.2900), xlim = c(200, 400), file = 1)
mzr_1 <- 459.28 + c(-0.01, 0.01)
chr_1 <- xcms::chromatogram(pk_gr, mz = mzr_1)
plot(chr_1)

features_mz =data.frame(ft_inf@listData[["mzmed"]])

plyush1993 avatar May 20 '25 15:05 plyush1993

Hm, that is unexpected. I would suggest that you check the m/z range of interest after each processing step, i.e.

## The m/z range might need to be adapted... depending on what you really expect...
mzr <- c(459.2728, 459.2852)

eic_0 <- chromatogram(feat_det, mz = mzr)
plot(eic_0)
## Eventually, you can cross-check the chrom peaks: `chromPeaks(eic_0)`

eic_1 <- chromatogram(ret_cor, mz = mzr)
plot(eic_0)

## and finally also after correspondence
eic_2 <- chromatogram(pk_gr, mz = mzr)

## That plot should actually show you if and how a feature was defined for your peaks of interest
plotChromPeakDensity(eic_2, simulate = FALSE)

jorainer avatar May 22 '25 05:05 jorainer

Thanks for the suggestion! After running, after each plot it produces a plot with a target peak. After the last line plotChromPeakDensity(eic_2, simulate = FALSE) it produces an error: Error: No chromatographic peaks present. Please run 'findChromPeaks' first.

plyush1993 avatar May 22 '25 10:05 plyush1993

Oh, this is unexpected - does pk_gr contain chromatographic peaks (i.e. is hasChromPeaks(pk_gr) TRUE)? it would be quite strange, because that means you lost your peaks somewhere along the way...

jorainer avatar May 30 '25 12:05 jorainer

Perhaps it is because the data was acquired in DDA mode that this peak is displayed in findChromPeaks as a peak with MS level = 2. Is there any way to coerce it to MS1 level?

plyush1993 avatar Jun 01 '25 11:06 plyush1993

Wait - now I get confused. xcms runs peak detection by default (only) on MS1 data. The MS2 scans are not considered in that step. So, the chromPeaks() (actually the chromPeakData()) should always list only MS 1 as the MS level of your chrom peaks. What could be is that there are no chromatographic peaks identified in the m/z range you selected? Can you check with chromPeaks(eic_1) or chromPeaks(eic_2) if you get chrom peaks or just an empty matrix?

jorainer avatar Jun 04 '25 09:06 jorainer

Thank you! So here is the output from eic_1:

eic_1 XChromatograms with 1 row and 3 columns 1 2 3 <XChromatogram> <XChromatogram> <XChromatogram> [1,] peaks: 0 peaks: 0 peaks: 0 phenoData with 1 variables featureData with 3 variables

Let me share also a peak table (feat_det@featureData@data) with 459.28 mass below: Image So some peaks is from MS1 (base peak mz), and some from MS2 level (precursor mz).

The point is peak is present in vendoe software, Maven, and MZmine. So I tend to think it is related to data converting and/or DDA mode.

plyush1993 avatar Jun 04 '25 14:06 plyush1993

your eic_1 does not contain any identified chromatographic peaks (that explains the error). the screenshot/table that you show are actually the metadata from the spectra. Can you plot(eic_1) - you could then also use this eic_1 to fine tune the peak detection. my guess is that peak detection fails because you have very few actual data points there (looks like 3-4) to properly detect the chromatographic signal.

jorainer avatar Jun 05 '25 09:06 jorainer

Maybe I didn't understand correctly how it works, but the same part of the table is also in the object of groupChromPeaks. The full adress to the table feat_det@featureData@data or pk_gr@featureData@data. I tried to play with the noise level and bw with no positive result. When I opened this file in ElMaven (see screenshot) and typed the same mass as in the table with the same 10 ppm tolerance, it recognised 7 points per peak. Might it be that ElMaven / mzMine are able to operate with more scans in the MS1 level than xcms? In this case it would explain such a bug... Image

plyush1993 avatar Jun 05 '25 15:06 plyush1993

would it be possible for you to share one of your data files privately with me? I think it would be easier if I could dig into the MS data to see what the problem could be.

jorainer avatar Jun 18 '25 11:06 jorainer

Sure, thank you! Where would it be preferable to send?

plyush1993 avatar Jun 22 '25 12:06 plyush1993

Can I attach it here?

plyush1993 avatar Jul 16 '25 13:07 plyush1993

Chances are good it is too big :-( Limit seems to be 25MB. If you can upload anywhere and share the link, that would work. Yours, Steffen

sneumann avatar Jul 16 '25 19:07 sneumann

Thank you for your help and patience! I uploaded a zip archive with files for the minimal reproducible example in figshare. Short description about files:

plyush1993 avatar Jul 17 '25 11:07 plyush1993

Thanks for sharing the files. I'm starting from scratch again - just to understand where and what the problem might be. So, if I understand you correctly, you want to find a chromatographic peak with an m/z of 458.28 and a retention time of 5 minutes.

I did now load one of the files you provided and simply ran peak detection with your settings on that.

library(MsExperiment)
library(xcms)
a <- readMsExperiment(c("Pd_001srf_2_Y-D6_1_3859.mzML"))
cwp <- CentWaveParam(ppm = 10,
                     peakwidth = c(5, 50), 
                     snthresh = 10, 
                     prefilter = c(3, 300), 
                     noise = 1000) 
b <- findChromPeaks(a, param = cwp)

Now, extracting an ion chromatogram for the region of interest:

mzr <- c(459.2771, 459.2863)
rtr_min <- c(4.13, 6.12)
rtr <- rtr_min * 60
eic_b <- chromatogram(b, mz = mzr, rt = rtr)
plot(eic_b)

I get the same plot as you above but no chromatographic peak was detected (that would be highlighted/shaded in grey color):

Image

Also chromPeaks() on that EIC returns an empty matrix

> chromPeaks(eic_b)
     mz mzmin mzmax rt rtmin rtmax into intb maxo sn sample row column

To me it seems that centWave simply fails to identify your peak of interest. I tried changing the settings for CentWaveParam() and run findChromPeaks() on the eic_b, but without success. I'm afraid there are to few peak data points for centWave. It tries to model some sort of gaussian shaped signal, but in your case there is a single very high value and some lower intensity ones around that.

jorainer avatar Aug 19 '25 08:08 jorainer

I believe Johannes is correct. You can try to use matchedFilter instead. That is often very good at integrating sub-optimal data (at the cost of picking more noise).

stanstrup avatar Aug 19 '25 08:08 stanstrup

Thank you all @jorainer, @stanstrup, @sneumann for your time and help!

plyush1993 avatar Aug 19 '25 14:08 plyush1993