xcms
xcms copied to clipboard
[WIP] Add new Ion-mobility peak picking algorithm
This PR is an initiative to add support for ion-mobility (IM) mass spectrometry data processing to the xcms
spectra
branch.
This is still a work in progress and any contribution or suggestion is more than welcome
Up until now, the following elements have been implemented:
Add a new function for ion-mobility peak picking, compatible with MsExperiment
's Spectra
only (TimsTofBackend
), with a corresponding CentWaveParamIM class:
-
Create
do_findChromPeaks_IM_centWave
, the API for the peakpicking itself -
Create a
IMParam
class that inherits fromParam
and that will serve as superclass for all ion-mobility peak picking Params. -
Create a
CentWaveParamIM
that inherits from bothCentWaveParam
andIMParam
-
Add a function dispatch point in .mse_find_chrom_peaks_sample
-
Create documentation for both API and param class
There are still many more things to work on:
- Unit tests,
- Retention-time peak alignment for IM peaks
- IM peak grouping into features
- (...)
Hi Johannes, thanks for the code review! About the algorithms:
- Regarding peak-picking: as I commented above, this sequential 1D + 1D peak-picking (ie. first in the RT domain, and then, peak-wise, in the IM domain) is not the only possible strategy. One could also want to do a direct 2D peak-picking (for instance, here's a python package that does it https://deimos.readthedocs.io/en/latest/user_guide/peak_detection.html) without collapsing the IM data at all. Since both seem to be valid, we should try to be flexible
- Regarding other processing steps (alignment, grouping, etc.): we still have to reach that stage, but I'm sure we'll need to make some adjustments to current functions to keep track of the IM dimension data and avoid mixing up the peaks
For now: I'll make the amendments we discussed above, profile the algorithm performance, and create some unit tests for all new functions
Perfect! Thanks @RogerGinBer for the updates! And yes, we need to keep track of the IM dimension data for the identified peaks - that information could either go directly (as a new column) into the chromPeaks
matrix
or alternatively to the chromPeakData
data.frame
.
Good! I think I'll keep the IM ranges in the chromPeaks
matrix for now, since its something "essential" that defines the peaks
I'll keep you updated by posting new commits and ping you when I need some feedback :+1:
Something else: could you please already start to add unit tests? it's usually better to implement them right away for each function you write.
Yep, you're right, I completely forgot about that! :pray:
Until now I've been running some tests locally, since I didn't know that you could change the backend from the read-only MsBackendTimsTof
to something more portable like MsBackendMemory
. My question is: how do I add the IM data for the tests to the package? Do I generate a very limited Spectra
subset from the file I'm using? How small of an object (kB?) are we talking about?
Also, I found out that the raw IM data I was reading is in profile mode but does not have any flanking zeroes, so the localMaxima
approach of Spectra::pickPeaks
sadly does not work, but I have an idea for a workaround. I'll open an issue over at Spectra
to describe the problem and ping here
Thanks for the update - regarding the test/toy data set: I think a Spectra
with MsBackendMemory
would be ideal. As you mentioned, size should be small, maybe limit to a small m/z range and retention time window for a region where you expect to see some peaks/signals?
Hi there!
I've added a very compact IM dataset that contains a mz-rt region where an intense peak and its M1 isotope can be found. I've added unit tests for do_findChromPeaks_IM_centWave
and its internals for calculating and splitting mobilograms, as well as for the IMCentWaveParam
class.
I partly solved the problem I had in https://github.com/rformassspectrometry/Spectra/issues/257 by extending the m/z range of the detected peaks to compensate for the fact that a) individual datapoints' m/z fluctuate, and b) combineSpectra
groups similar m/z values into a single one. Without this correction, I had a problem where the reported mz range for a peak didn't match with any of the original m/z values in the non-combined Spectra.
I will now be experimenting with the do_findChromPeaks_IM_centWave
function to make sure it's working properly, but at least now we have some proper unit tests set up :+1:
Hi, xcms now has included the Spectra changes, and this PR should now go towards xcms devel branch. Norman and I actually just checked, and there would be no merge conflicts. I don't know if one can change the target branch for a PR easily, otherwise feel free to close and re-open agains devel. Yours, Steffen
Thanks for the heads-up @sneumann! I've updated the branch to devel (https://github.blog/2016-08-15-change-the-base-branch-of-a-pull-request/)