ms2rescore
ms2rescore copied to clipboard
Suggestions for Ion mobility MS data
Hello @RalfG,
First thanks very much for developing such wonderful tool, I really enjoyed working with it!
I have a question regarding how to apply MS2rescore to bruker (.d) or broadly speaking, the ion mobility MS data (TIMS), such that features from same RT will further be separated by a gas phase.
Practically, the issue I am facing right now is when using MS2PIP to generate features, I can not correspond the MaxQuant scan ID and the raw scan ID from bruker raw data (.d). It seems that MaxQuant did some sort of accumulation along ion mobility axis to make the MS/MS as conventional spectrum and then submitted to search engine (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7261821/). Because of that, my impression is the current implementation of MS2rescore can not handle well because it relies on one-to-one correspondence between a PSM and the raw MS/MS spectrum.
There are other conceptual challenges like the lack of training model for TIMS-TOF specifically, and technically, the ion mobility should also be a predictable feature that may help with rescoring. In light of that, I just want to get your thoughts on:
[1] Whether my understanding is correct, that current implementation of MS2rescore is more focused on Thermo data and may not be applicable to other like bruker TIMS data.
[2] Do you have any ideas on the difficulties that you can foresee of adapting the model to ion mobility data?
[3] If I still want to use that, can I skip the MS2PIP step and only use DeepLC and other features, even additional features from my customized function to assist with rescoring, do you think there will still be any increase on the identification rate? Because it seems that MS2PIP features indeed contribute a lot to the prediction.
Thanks very much in advance, Frank
Hi Frank,
Thank you for your interest in MS²Rescore!
Regarding your issue with MaxQuant, you are indeed right that we must receive a one-to-one relation between PSMs and spectra. In the case of aggregated spectra from a TIMSTOF, we require access to the aggregated spectra instead of the original ones.
So far, we have only tried MS²Rescore on TIMSTOF data analyzed with the PEAKS search engine, where PEAKS outputs both PSM files and MGF files with the aggregated spectra. Do you know if MaxQuant can similarly output the aggregated spectra in MGF or mzML formats?
In terms of prediction models, everything should be ready. In the upcoming v4.0 of MS²PIP (included in MS²Rescore v3), we have new specialized models for the TIMSTOF instruments. Both tryptic and non-tryptic, including HLA peptides are supported. You can configure MS²Rescore to use this model with the ms2pip configuration section:
"ms2rescore": {
"feature_generators": {
"ms2pip": {
"model": "timsTOF"
Very recently, we have also added the ionmob ion mobility predictor as feature generator. Installing MS²Rescore with the optional dependency (pip install --pre ms2rescore[ionmob]
should install everything you need. Then simply add "ionmob": {}
to the feature_generators
section of the configuration files:
"ms2rescore": {
"feature_generators": {
"ionmob": {}
Let us know if we could look into the spectrum matching from MaxQuant issue together. We would definitly like to help you out.
Best, Ralf
Hi Ralf,
Thanks very much for getting back to me!
I really appreciate the efforts for timsTOF prediction, just to clarify, is the model "timsTOF" for tryptic or non-trypic mode?
ionmob looks really cool, one question, I assume to enable automatic feature generation, we need to have CCS value in the msms.txt
file right? Right now it seems that the CCS value is not in maxquant msms.txt
but in evidence.txt
file, so I guess I need to first transfer the CCS value to the msms.txt
file when using the ms2rescore right?
For the Maxquant accumulated spectrum, I opened an issue in their google group (https://groups.google.com/g/maxquant-list/c/mztk0wyUg-w) but hasn't heard back from them. I tried to figure out myself but no luck, it seems that the bruker raw data are laid out as below (I used proteoWizard to convert to mzML):
spectrum1 frame1 scan1
spectrum2 frame1 scan2
...
spectrum_n frame1 scan_n
spectrum_n+1 frame2 scan1
...
...
MaxQuant has accumulatedMsmsscan
and pasefMsmsScans
in their txt output, but it is not intuitive for me to exactly reproduce how they conduct the accumulation. I understand it is definitely not part of your job as the ms2rescore developer, but if you happen to have any ideas or chances to analyze a bruker .d public file using maxquant, your insights would be really appreciated, and I am sure will benefit more users who try the ms2rescore.
Thanks again, Frank
Hi Frank,
Getting back to this thread, as we have now released MS²Rescore v3.1 with full support for DDA-PASEF data. There is an associated preprint on bioRxiv (10.1101/2024.05.29.596400).
To your previous questions:
- The new MS²PIP prediction model for timsTOF supports both tryptic and non-tryptic data.
- ionmob now has a sibling in MS²Rescore: IM2Deep. It's a CCS predictor based on DeepLC, so it also supports any peptide modification.
- If the ion mobility values are not part of the PSM file (such as in your case the msms.txt), MS²Rescore v3.1 will search for them in the spectrum files.
- For accumulated spectra: This can be an annoying issue. In essence, MS²Rescore needs access to the spectra in the way they were used by the search engine and reported in the PSM file. So if a search engine itself combines multiple spectra into one (for instance through scan summing), MS²Rescore would need those newly combined spectra. A potential workaround to this issue would be to first convert and combine the spectra with a tool such as MSConvert, and then pass these spectra to the search engine and disable it's accumulation feature.
Feel free to ask if you have further questions.
Best, Ralf