Spectra
Spectra copied to clipboard
pickPeaks fails to centroid if no flanking zeroes are present
Hi there!
While working with raw LC-IM-MS data (related to https://github.com/sneumann/xcms/pull/647), I've noticed that pickPeaks
does not correctly centroid the data if no flanking zero values are present. Concretely, it can:
- Fail to keep valid data points
- Alter very signifcantly the mz values via
refineCentroids
(uses mz points that are far away)
Here's a reproducible example, using directly the internal .peaks_pick
. The input data is a real, rather noisy, TOF scan where, despite being in profile (I've double-checked it), it's very sparse (has no zero values):
pdata <- matrix(
c(57.07044,89.07047,171.15151,186.95607,188.18357,190.06249,191.10797,196.08091,
201.11109,205.08090,210.13168,210.97000,215.13854,218.10776,
2996,2170,269,1607,21,273,288,314,203,197,3102,236,1779,47),
ncol = 2
)
pdata #Optimal result, applying the function shouldn't change anything
Spectra:::.peaks_pick(pdata, spectrumMsLevel = 1) #Only keeps 3 mz peaks
Spectra:::.peaks_pick(pdata, spectrumMsLevel = 1, halfWindowSize = 1L) #Lowering hws helps, but still not good
Spectra:::.peaks_pick(pdata, spectrumMsLevel = 1, halfWindowSize = 1L, k = 2L) #Using the neighbors for refining messes up the mzs
To work around this problem, I'd like to contribute with this solution (still a sketch, probably should refactor to remove dependency on R.utils
), which basically appends an arbitrary amount of zero values in the peak matrix when a mass "gap" larger than a tolerance is found (usually not larger than 0.01Da for HRMS instruments). Also, we keep the mz order of the data by imputing mz values in between (see last examples).
This function would be called first and then the regular peakPicks
would go on as usual:
.add_profile_zeros <- function (x, tol = 0.2, n = 5) {
if (!nrow(x)) return()
pos <- which(diff(x[,1]) > tol) + 1
if (!length(pos)) return(x)
mzs <- R.utils::insert(x[,1], ats = pos, values = lapply(x[pos - 1,1], function(x) rep(x + tol/2, each = n)))
ints <- R.utils::insert(x[,2], ats = pos, values = list(rep(0, times = n)))
matrix(c(mzs, ints), ncol = 2)
}
pickPeaksSparse <- function (object, ...)
{
.local <- function (object, massGap = 0.2, nZeros = 5L, ...)
{
if (!is.numeric(massGap) || length(massGap) != 1L || massGap <= 0L)
stop("Argument 'massGap' has to be an numeric of length 1 ",
"and > 0.")
if (!is.integer(nZeros) || length(nZeros) != 1L || nZeros <= 0L)
stop("Argument 'nZeros' has to be an integer of length 1 that is > 0.")
object <- addProcessing(object, .add_profile_zeros, tol = massGap,
n = nZeros, ...)
object@processing <- Spectra:::.logging(object@processing, "Filled mz gaps larger than ",
massGap, " with ", nZeros, " zeros.")
object <- pickPeaks(object, ...)
}
.local(object, ...)
}
And now it would work:
.add_profile_zeros(pdata)
Spectra:::.peaks_pick(.add_profile_zeros(pdata), spectrumMsLevel = 1, halfWindowSize = 1L, k = 2L)
What do you think? Would this internal .add_profile_zeros
belong here or in MsCoreUtils
?
I'll be happy to contribute, :+1: Roger