xcms icon indicating copy to clipboard operation
xcms copied to clipboard

[WIP] Add new Ion-mobility peak picking algorithm

Open RogerGinBer opened this issue 1 year ago • 9 comments

This PR is an initiative to add support for ion-mobility (IM) mass spectrometry data processing to the xcms spectra branch. This is still a work in progress and any contribution or suggestion is more than welcome

Up until now, the following elements have been implemented:

Add a new function for ion-mobility peak picking, compatible with MsExperiment's Spectra only (TimsTofBackend), with a corresponding CentWaveParamIM class:

  • Create do_findChromPeaks_IM_centWave, the API for the peakpicking itself

  • Create a IMParamclass that inherits from Param and that will serve as superclass for all ion-mobility peak picking Params.

  • Create a CentWaveParamIM that inherits from both CentWaveParam and IMParam

  • Add a function dispatch point in .mse_find_chrom_peaks_sample

  • Create documentation for both API and param class

There are still many more things to work on:

  • Unit tests,
  • Retention-time peak alignment for IM peaks
  • IM peak grouping into features
  • (...)

RogerGinBer avatar Dec 02 '22 17:12 RogerGinBer

Hi Johannes, thanks for the code review! About the algorithms:

  • Regarding peak-picking: as I commented above, this sequential 1D + 1D peak-picking (ie. first in the RT domain, and then, peak-wise, in the IM domain) is not the only possible strategy. One could also want to do a direct 2D peak-picking (for instance, here's a python package that does it https://deimos.readthedocs.io/en/latest/user_guide/peak_detection.html) without collapsing the IM data at all. Since both seem to be valid, we should try to be flexible
  • Regarding other processing steps (alignment, grouping, etc.): we still have to reach that stage, but I'm sure we'll need to make some adjustments to current functions to keep track of the IM dimension data and avoid mixing up the peaks

For now: I'll make the amendments we discussed above, profile the algorithm performance, and create some unit tests for all new functions

RogerGinBer avatar Dec 07 '22 22:12 RogerGinBer

Perfect! Thanks @RogerGinBer for the updates! And yes, we need to keep track of the IM dimension data for the identified peaks - that information could either go directly (as a new column) into the chromPeaks matrix or alternatively to the chromPeakData data.frame.

jorainer avatar Dec 13 '22 14:12 jorainer

Good! I think I'll keep the IM ranges in the chromPeaks matrix for now, since its something "essential" that defines the peaks I'll keep you updated by posting new commits and ping you when I need some feedback :+1:

RogerGinBer avatar Dec 13 '22 18:12 RogerGinBer

Something else: could you please already start to add unit tests? it's usually better to implement them right away for each function you write.

jorainer avatar Dec 21 '22 09:12 jorainer

Yep, you're right, I completely forgot about that! :pray: Until now I've been running some tests locally, since I didn't know that you could change the backend from the read-only MsBackendTimsTof to something more portable like MsBackendMemory. My question is: how do I add the IM data for the tests to the package? Do I generate a very limited Spectra subset from the file I'm using? How small of an object (kB?) are we talking about?

Also, I found out that the raw IM data I was reading is in profile mode but does not have any flanking zeroes, so the localMaxima approach of Spectra::pickPeaks sadly does not work, but I have an idea for a workaround. I'll open an issue over at Spectra to describe the problem and ping here

RogerGinBer avatar Dec 21 '22 21:12 RogerGinBer

Thanks for the update - regarding the test/toy data set: I think a Spectra with MsBackendMemory would be ideal. As you mentioned, size should be small, maybe limit to a small m/z range and retention time window for a region where you expect to see some peaks/signals?

jorainer avatar Dec 23 '22 10:12 jorainer

Hi there!

I've added a very compact IM dataset that contains a mz-rt region where an intense peak and its M1 isotope can be found. I've added unit tests for do_findChromPeaks_IM_centWave and its internals for calculating and splitting mobilograms, as well as for the IMCentWaveParam class.

I partly solved the problem I had in https://github.com/rformassspectrometry/Spectra/issues/257 by extending the m/z range of the detected peaks to compensate for the fact that a) individual datapoints' m/z fluctuate, and b) combineSpectra groups similar m/z values into a single one. Without this correction, I had a problem where the reported mz range for a peak didn't match with any of the original m/z values in the non-combined Spectra.

I will now be experimenting with the do_findChromPeaks_IM_centWave function to make sure it's working properly, but at least now we have some proper unit tests set up :+1:

RogerGinBer avatar Jan 03 '23 22:01 RogerGinBer

Hi, xcms now has included the Spectra changes, and this PR should now go towards xcms devel branch. Norman and I actually just checked, and there would be no merge conflicts. I don't know if one can change the target branch for a PR easily, otherwise feel free to close and re-open agains devel. Yours, Steffen

sneumann avatar Jul 10 '23 15:07 sneumann

Thanks for the heads-up @sneumann! I've updated the branch to devel (https://github.blog/2016-08-15-change-the-base-branch-of-a-pull-request/)

RogerGinBer avatar Jul 11 '23 08:07 RogerGinBer