xcms icon indicating copy to clipboard operation
xcms copied to clipboard

Data independent acquisition/SWATH support

Open jorainer opened this issue 5 years ago • 23 comments

Enable analysis of data independent acquisition (including SWATH) data.

  • [x] peak detection in pockets/isolation windows.
  • [x] data structures for identified chromatographic peaks (issue #346).
  • [x] functionality to build MS2 spectra.
  • [x] vignette describing the new functionality.

concepts base on https://github.com/michaelwitting/metabolomics2018 from @michaelwitting

jorainer avatar May 07 '19 07:05 jorainer

Data structures for identified chrom peaks:

  • it is possible to add arbitrary annotations to each chromatographic peak using the chromPeakData DataFrame. Default columns in that DataFrame are ms_level and is_filled, to enable SWATH support we will have there also a column isolationWindow that identifies in which isolation window the peak was identified.

jorainer avatar May 07 '19 07:05 jorainer

Peak detection within isolation windows is possible with the findChromPeaksIsolationWindow function. The isolation window (i.e. definition which spectra belong to which isolation window) can be specified with the isolationWindow parameter.

jorainer avatar May 07 '19 07:05 jorainer

For each MS1 peak we have then to

  • identify fragment candidates (MS2 peaks) within the same rt window.
  • extract chromatogram for all
  • align chromatograms
  • correlate chromatograms
  • reconstruct MS2 spectrum from MS2 peaks with correlation > x

@michaelwitting, is that correct?

jorainer avatar May 07 '19 07:05 jorainer

Parts of that was done in a prototype using CAMERA, i.e. groupCorr() in a MS2 pocket gives a spectrum, and then we "only" need to find from which MS1 precursor that might originate. My prototype did not correlate the MS1 and MS2 chromatogram. Would it be interesting to calculate and attach a "TIC" chromatogram for all MS2 peaks in a collected MS2 spectrum, since it will be smoother than the individual ones ? Yours, Steffen

sneumann avatar May 07 '19 08:05 sneumann

@jorainer, yes correct so far. I started from the MS1 peak. Checked in which pocket it might fall and got all the MS2 peaks that where within a certain RT range around the MS1 peak, e.g. +/- 0.1 minutes around RT of MS1 peak. I have some prototype code here for the alignment and correlation. I will finish it and push it this evening.

michaelwitting avatar May 07 '19 08:05 michaelwitting

Side-note: there is public SWATH data as mzML in https://www.ebi.ac.uk/metabolights/MTBLS297 I could create a package mtbls297 similar to mtbls2, which could be used in a new vignette ? The vignette could live in mtbls297, saving us the hassle to have another few dozens of raw data in suggests for xcms. Yours, Steffen

sneumann avatar May 07 '19 11:05 sneumann

Sounds like a good idea. Since you know the people from this dataset quite well, there should be also no political problems ;-)

michaelwitting avatar May 07 '19 11:05 michaelwitting

Yes @sneumann ! That would be awesome! So far I am @michaelwitting 's toy data set and I was trying to talk him into adding that to the msdata package.

jorainer avatar May 07 '19 11:05 jorainer

Actually, it might still be helpfull to add one SWATH mzML file to msdata to have something for the unit tests...

jorainer avatar May 07 '19 12:05 jorainer

No problem. Just take my toy data set. We can have for the next bioconductor release.

michaelwitting avatar May 07 '19 12:05 michaelwitting

Get files from mtbls297 package:

library(Risa)
library(xcms)

ISAmtbls297 <- readISAtab(find.package("mtbls297"))
assay <- [email protected][[1]]
msfiles <- paste(find.package("mtbls297"), "mzML",
                 [email protected]$"Derived Spectral Data File",
                 sep="/")

Works for above AB Sciex, adn Bruker mid-band CID so far. MS1 peak picking:

cwp <- CentWaveParam(ppm = 25, peakwidth = c(10, 20), snthresh = 10,
  prefilter = c(3, 100), mzCenterFun = "wMean", integrate = 1L,
  mzdiff = -0.001, fitgauss = FALSE, noise = 0, verboseColumns = FALSE,
  roiList = list(), firstBaselineCheck = TRUE, roiScales = numeric())

raw_data <- readMSData(msfiles, mode = "onDisk")

## Perform the peak detection using the settings defined above.
mtbls297 <- findChromPeaks(raw_data, param = cwp, BPPARAM = MulticoreParam())

Now get the SWATH data:

x2 <- findChromPeaksIsolationWindow(mtbls297, 
                                    param = cwp, 
                                    BPPARAM = MulticoreParam())
cpd <- chromPeakData(x2)

Although no data yet:

> head(cpd)
DataFrame with 6 rows and 6 columns
        ms_level is_filled isolationWindow isolationWindowTargetMZ
       <integer> <logical>        <factor>               <numeric>
CP0001         1     FALSE              NA                      NA
CP0002         1     FALSE              NA                      NA
CP0003         1     FALSE              NA                      NA
CP0004         1     FALSE              NA                      NA
CP0005         1     FALSE              NA                      NA
CP0006         1     FALSE              NA                      NA

sneumann avatar May 07 '19 19:05 sneumann

@sneumann, can you share the mtbls297 package somehow?

jorainer avatar May 08 '19 16:05 jorainer

The code to align Chromatogram objects and to correlate them will be implemented in methods #379 and #380 - these might eventually then go to MSnbase.

jorainer avatar May 10 '19 07:05 jorainer

One thought that came to my mind is that we reconstruct the spectra for each ChromPeak, that would mean also for isotopes, adducts etc. Do we want this behavior? Could be somehow also used at a later stage, e.g. different isotopes should have the same reconstructed MS2 spectrum.

michaelwitting avatar May 22 '19 10:05 michaelwitting

agree - but I would do this in a second step. IMHO it would be easier (and safer) to define the MS2 spectrum for each chromatographic peak (in each file) separately (without taking any other information into account) and then do the refinements later (e.g. with combineSpectra to define the common MS2 spectrum for isotopes).

If we see that this will not work or if we see improvements we can then later implement more sophisticated approaches.

jorainer avatar May 22 '19 10:05 jorainer

or implement an additional reconstructFeatureSpectra that does take correlation of the peaks across samples into account.

jorainer avatar May 22 '19 10:05 jorainer

Okay... Let's do it that way. I'm sorry, I'm a bit behind with everything. Mostly with my habilitation, which is due in 4 months. So I have to hurry up a bit...

michaelwitting avatar Jun 02 '19 16:06 michaelwitting

If you are too busy @michaelwitting I can implement the function to reconstruct the MS2 spectrum (#377) and let you have a look at it if it makes sense. Once we have that function we can think how to improve (e.g. include correlation across samples or similar as discussed in #377.

jorainer avatar Jun 04 '19 06:06 jorainer

BTW, if not mistaken you said you were working on the swath vignette @michaelwitting - if so, can you push or make a pull request?

jorainer avatar Jun 04 '19 06:06 jorainer

Yes. Will come soon. Problems with R, everything was lost. You will get a push soon.

michaelwitting avatar Jun 04 '19 07:06 michaelwitting

Added a few things to the vignette and splitted also out the tomato part (vignette on its own). Once we have the reconstruction function I can finish the part on the example substance.

michaelwitting avatar Jun 08 '19 08:06 michaelwitting

For the things that are still missing - I'm also quite busy at present, but we could discuss and implement the things at latest in Den Haag.

jorainer avatar Jun 11 '19 06:06 jorainer

Let's do in The Hague. I'm also out on another conference from saturday. See you there!

michaelwitting avatar Jun 11 '19 10:06 michaelwitting