Plotting MS data
This issue requests input from @sgibb, @jotsetung and any other interested readers on how to harmonise plotting of MS data (and MSnExp, OnDiskMSnExp and Spectrum objects) in MSnbase. currently, we have plot methods for spectra and MSnExp, where the latter plots the spectra in the experiment, which is not ideal. Here's are two examples:


The MSmap data structure also supports plotting:


@jotsetung (see #312) has these really nice visualisations, which I would call extracted ion chomatograms (XIC):

and we can also plot Chromatogram object:

I think it would be useful to rationalise all these and properly document these visualisations in one place (for example to work on a MSnbase-vis vignette).
In terms of rationalisation
- Should we have a single
plotmethod that does a reasonable thing depending on the argument, or different (plotXIC,plotMS, ...). I prefer the former, as long as we plan arguments and defaults wisely. -
plot,Spectrumneeds some work, but the basics are OK -
plot,MSnExpthat plots all spectra should be replaced with a better default, for example the heatmap fromMSmap(although this might be time consuming for a plot) - It would be great to incorporate @jotsetung the XIC visualisation
- ...
Suggestions
-
plot(<Spectrum>)- as currently available (with possible improvements) -
plot(<Spectrum>, <Spectrum>)- as currently available. -
plot(<Spectrum>, <character>)- as currently available. -
plot(<MSmap>)- as currently available, with possible a harmonisation ofMSmapandChromatogram[s]interfaces. Could also be implemented asplot(<MSnExp>, type = "MSmap")with retention time (filterRt) and mz (filterMz) filtering delegated to the user. -
plot(<MSnExp>, type = "XIC") - formerplotMsData(see above), with retention time (filterRt) and mz (filterMz`) filtering delegated to the user. Done in #314. -
plot(<MSnExp>, type = "spectrum")is currently available, but could be removed and instructions would be provided to plot spectra and rearrange them using, for examplepatchwork. -
plot(<Chromatogram>)andplot(<Chromatograms>)- as currently available.
Re harmonising: always a good idea. Question is whether we should also harmonize the use of ggplot or base plotting. Although ggplot is powerful, I'm still a fan of the base R plotting functions.
Re XIC visualisation: what I would call the XIC is the plot,Chromatogram, at least that's how I generate them. This plot I was producing (with the plotMsData) provides some more info. The upper panel, yes, is a XIC (BPC), the lower panel is what I refer to the MS data plot, as it shows the MS data as-is, without binning (such as the MSmap). The plot makes mostly sense for a m/z-rt slice of the data, so yes, that would qualify the plot as extracted ion chromatogram (or extracted ion plot?). I could live with a plotXIC,MSnExp here - or plot,MSnExp with argument type = "XIC" (would favour that).
Question is whether we should also harmonize the use of
ggplotor base plotting. Althoughggplotis powerful, I'm still a fan of the base R plotting functions.
I think we should use the best tool for the job, and give the developer the choice of what they prefer.
I will update the first opening comment to reflect your suggestion using type = "XIC" and summarise the discussion/suggestions we have.
plot(<MSnExp>, type = "XIC", rt, mz)- formerplotMsData(see above)
You think parameters rt and mz are required? We could also drop these leaving the task of subsetting by rt and mz using filterRt and filterMz to the user.
Regarding axes labeling: "Retention time" and "M/Z" is what we're going for? I thought m/z is usually written in lower case - but I'm most likely wrong here.
You think parameters
rtandmzare required? We could also drop these leaving the task of subsetting by rt and mz usingfilterRtandfilterMzto the user.
Yes, I like that - updated above.
What happens of a user doesn't do the filtering (it's easy to verify that arguments aren't missing), and tries to generate such a plot from a full experiment?
well, it will be a quite busy plot. it will take quite a while and I think the lower part will be just black. The upper part will actually be still informative, since it will show the base peak chromatogram from the full experiment. Still, to create such a BPC I would rather use plot(chromatogram(<MSnExp>)).
Indeed. So I was thinking in somehow checking that the MSnExp that is passed has a sensible retention time and mz range (or has been filtered beforehand) and, if not, possibly provide a message that this might take a lot of time and might not be informative, referring them to ?Chromatogram. What do you think?
And regarding M/Z or m/z, I think I was told to use the upper case version, but it looks like the standard abbreviations from two proteomics journals use the latter, so let's go the m/z.
Yes, good point. I'll add a check based on the number of spectra and m/z values.
OK, m/z it will be.
I don't know why I was originally reading this thread, but saw: https://github.com/lgatto/MSnbase/issues/313#issuecomment-366715190 and wanted to mention that ggplot is not so great when plotting lots of data points (ie non-data-reduced spectra), base really excels in that scenario
@chasemc - indeed, we all agree with you. We might revert back to base plotting in some cases where ggplot2 isn't used to its full potential.
In the framing of rationalising MS data plotting, it will also be worth considering unit tests using the vdiffr package.
@jorainer @sgibb - at the last EuroBioc2019 conference, there were some discussions about MS plotting. Is this issue still relevant/useful?
well, yes, but in the Spectra package.
Unfortunately, I can't seem to transfer this issue to Spectra. Will keep it for now and will copy the info into a new one.