xcms icon indicating copy to clipboard operation
xcms copied to clipboard

Add function to reconstruct MS2 spectrum

Open jorainer opened this issue 5 years ago • 16 comments

For issue #375 we need a function to reconstruct the MS2 spectrum from the list of Chromatogram objects. Can you add that @michaelwitting ?

jorainer avatar May 09 '19 08:05 jorainer

Can do that. The other way round, can you get a function to correlate Chromatograms that is fast (based on biocParallel?) May idea would be to take a matrix of correlation values and everything above treshold goes into the MS2 spectrum. Intensities should be which ones? into?

michaelwitting avatar May 09 '19 09:05 michaelwitting

Re correlation of Chromatograms, yes, can do that.

Regarding the reconstruction - I didn't yet think of the best way. into sounds good, but might be nice if we could also use a different column instead. Could the function take a chromPeak matrix as input and return a Spectrum2 of that? mz being the "mz" column and intensity configurable by a parameter?

jorainer avatar May 09 '19 10:05 jorainer

That was the plan, so: Yes. Do we have the chromPeaks already with MS2 levels and pockets? I will think about a function... Maybe not today, but until Sunday, when I'm in the plane to Greece.

michaelwitting avatar May 09 '19 10:05 michaelwitting

After aligning and correlation we will have a chromPeaks matrix containing only the (MS2) peaks for one MS1 chrom peak that pass the correlation criteria. So it should be straight forward I think.

jorainer avatar May 09 '19 10:05 jorainer

Question @michaelwitting : do we expect that the MS2 signal for an ion is higher than the corresponding MS1 signal? The plot below shows the MS1 chromatogram in black (thick line) and all MS2 chromatograms for that m/z and retention time in light grey/blue if their correlation is > 0.8.

Rplot

Should we add an additional criteria to check that, apart from the correlation coefficient, also the max intensity of the MS2 is <= the max intensity of the MS1?

jorainer avatar May 10 '19 06:05 jorainer

Yes, fragment intensities can be higher then the original one. I would say we only use the correlation for the moment and return a Spectrum2 object that can be then cleaned by additional means.

michaelwitting avatar May 10 '19 07:05 michaelwitting

OK 👍

jorainer avatar May 10 '19 08:05 jorainer

One idea is to highjack the CAMERA approach: We can check if peak intensties not only correlate along RT but also along samples. If we have higher intensities in one sample for the precursor, the fragments should follow. That could maybe remove some false positive that might co-elute.

@sneumann We would adapt functions from CAMERA for that, which would go to XCMS and can be reused by in CAMERA.

What do you think?

michaelwitting avatar May 10 '19 13:05 michaelwitting

No problem recycling CAMERA code. If xcms exports the migrated functions, we can have CAMERA switch to them, and check dependency xcms >= 3.x.y. Please open CAMERA issues for each function that shall be migrated. Yours, Steffem

sneumann avatar May 12 '19 14:05 sneumann

I see the point @michaelwitting - only that we are processing the data at present separately for each file (to enable parallel processing). What if we add this as a postprocessing step to clean reconstructed spectra? Also because this would require the definition of the features in order to know which chromatographic peak in sample 1 goes along with a chromatographic peak from sample 2.

jorainer avatar May 13 '19 03:05 jorainer

We can do it post chromatogram alignment and correlation. Would be good to have some "annotation" with the reconstructed spectrum. Maybe a matrix with the correlation values? This matrix could be enriched with other values that might be used for filtering.

michaelwitting avatar May 13 '19 07:05 michaelwitting

Yes, that (annotation for reconstructed spectrum) would/should be doable.

If possible I would like to keep the definition of the MS2 spectrum separate from the quality assessment of the reconstructed spectra. Also, because we could use/reuse this logic also for other settings. So, ideally, I would like to have:

  • reconstructChromPeakSpectra: reconstructs MS2 spectra for each chromatographic peak in a file. Does not need features to be defined and can be called directly after chromatographic peak detection.
  • function that takes a Spectra (with some additional information) and runs quality checks (like correlation of peak across samples). We might even want to re-use this tool then to define the representative MS2 spectrum for a feature for the GNPS stuff, i.e. select the MS2 spectrum for which the intensities of the peaks best follow the intensity of the precursor (chrom peak).
  • reconstructFeatureSpectra: this could use the function above to clean/purge the reconstructed MS2 spectra for each chromatographic peak associated with the spectrum and return a single higher confidence MS2 spectrum. This function can only be called if features are defined.

What do you think @michaelwitting ? Another benefit is that the code will stay cleaner/simpler to maintain I believe.

jorainer avatar May 13 '19 07:05 jorainer

Sounds good! Let's keep it separate!

michaelwitting avatar Jun 11 '19 09:06 michaelwitting

So, for each MS1 chrom peak we can get the MS2 chrom peaks with a correlation higher than a certain threshold. From that we can reconstruct the MS2 Spectrum.

We can use the "mz" values of the chrom peaks as m/z values of the spectrum, but should we use the "maxo" or the "into" for the intensities? I'd go for the "maxo", what do you think @michaelwitting @sneumann ?

jorainer avatar Jun 20 '19 12:06 jorainer

I would also go for maxo, but maybe we should keep the freedom to define what we would like to use by having a parameter in the function?

michaelwitting avatar Jun 20 '19 17:06 michaelwitting

Summarizing, the reconstructChromSpectra is somewhat similar to MS-DIAL's MS2Dec function. The reconstructFeatureSpectra could in addition support a reconstruction similar to MS-DIAL's CorrDec function since we do have then also the intensities across samples.

jorainer avatar Jun 27 '19 09:06 jorainer