FragPipe
FragPipe copied to clipboard
Phosphopeptide site localization and quantification question
Hello, I am performing phosphoproteome-based closed searches with LFQ and have a question/clarification about the PTMProphet output. I appreciate the report format as it provides indexed and localized sites with quantification data. Based on the "Peptide" column, it appears that multiple modified peptides have been aggregated as long as they contains the "index" site. This index site may be a subset of sites if the peptide has multiple sites of modification (below example)
AGEEDEGEEDsDSDYEISAK;AGEEDEGEEDsDsDYEISAK
How are the quantitative values summarized when multiple peptides are listed? In the example above, if there is a summarization/aggregation step for quantitative values, what is the rationale for combining quant value for single and multi-site phosphorylations? My preference would be to interpret them separately. Is the PTM quantification strategy you have taken in TMT-Integrator which uses single versus multi-site quantification reports a solution to this?
Related to these analyses, I have timsTOF phosphopeptide data. Since PTMProphet does not support timsTOF data, is there a recommend approach for obtaining PTM site localization from timsTOF data?
Thanks again for your help, Todd
Hmm, in TMT-Integrator we do not aggregate AGEEDEGEEDsDSDYEISAK;AGEEDEGEEDsDsDYEISAK
They would be separate entries in multi-site report, and AGEEDEGEEDsDSDYEISAK will be selected for the DsD site in the single site report.
IonQuant site-level reports were more recently introduced. Fengchao, can you describe what we do in the IonQuant phospho reports?
For PTMProphet on timsTOF, plan to write mzML in the future so PTM-Prophet will work. But for now you can try converting .d to mzML with Proteowizard first
Thanks
Alexey
From: enonimos @.> Sent: Wednesday, June 29, 2022 11:46 PM To: Nesvilab/FragPipe @.> Cc: Subscribed @.***> Subject: [Nesvilab/FragPipe] Phosphopeptide site localization and quantification question (Issue #746)
External Email - Use Caution
Hello, I am performing phosphoproteome-based closed searches with LFQ and have a question/clarification about the PTMProphet output. I appreciate the report format as it provides indexed and localized sites with quantification data. Based on the "Peptide" column, it appears that multiple modified peptides have been aggregated as long as they contains the "index" site. This index site may be a subset of sites if the peptide has multiple sites of modification (below example)
AGEEDEGEEDsDSDYEISAK;AGEEDEGEEDsDsDYEISAK
How are the quantitative values summarized when multiple peptides are listed? In the example above, if there is a summarization/aggregation step for quantitative values, what is the rationale for combining quant value for single and multi-site phosphorylations? My preference would be to interpret them separately. Is the PTM quantification strategy you have taken in TMT-Integrator which uses single versus multi-site quantification reports a solution to this?
Related to these analyses, I have timsTOF phosphopeptide data. Since PTMProphet does not support timsTOF data, is there a recommend approach for obtaining PTM site localization from timsTOF data?
Thanks again for your help, Todd
— Reply to this email directly, view it on GitHubhttps://github.com/Nesvilab/FragPipe/issues/746, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AIIMM62XXAXPFUTALYRJM2TVRUKAPANCNFSM52H3EBTQ. You are receiving this because you are subscribed to this thread.Message ID: @.@.>>
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues
Yes, we using the intensities from both AGEEDEGEEDsDSDYEISAK and AGEEDEGEEDsDsDYEISAK in calculating the intensity for the first phospho site. We use top-N and MaxLFQ algorithms to rolling the intensity. Maybe we should not using AGEEDEGEEDsDsDYEISAK's intensity for the first phospho site.
Best,
Fengchao
The logic we have in generating single-site reports in TMT-I is that if there is a monophosphorylated peptides with that site localized, that is what is used (doubly phosphorylated discarded). Fengchao, we can discuss, I can share the schema we have for collapsing to single-site level in TMT-I
From: Fengchao @.> Sent: Thursday, June 30, 2022 12:20 AM To: Nesvilab/FragPipe @.> Cc: Nesvizhskii, Alexey @.>; Comment @.> Subject: Re: [Nesvilab/FragPipe] Phosphopeptide site localization and quantification question (Issue #746)
External Email - Use Caution
Yes, we using the intensities from both AGEEDEGEEDsDSDYEISAK and AGEEDEGEEDsDsDYEISAK in calculating the intensity for the first phospho site. We use top-N and MaxLFQ algorithms to rolling the intensity. Maybe we should not using AGEEDEGEEDsDsDYEISAK's intensity for the first phospho site.
Best,
Fengchao
— Reply to this email directly, view it on GitHubhttps://github.com/Nesvilab/FragPipe/issues/746#issuecomment-1170738201, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AIIMM6ZNNF64QUJUG7HCN6TVRUN6LANCNFSM52H3EBTQ. You are receiving this because you commented.Message ID: @.@.>>
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues
Hi Alexey and Fengchao, Thank you for the clarification and comparison to TMT-I. This strategy makes sense to me and I would support its inclusion as feature in IonQuant.
Regarding conversion of .d Bruker to mzml for PTMProphet, I used Proteowizard's MSConvert tool. I did a FragPipe test search with one file, but generated an error message, "Could not allocate arrays during spectra decoding step" during mass recalibration and parameter optimization (see log file attached). If I disable these, then the search completes successfully.
Is RAM an issue (my computer has 64GB of RAM)? Do you have any recommendations for different Proteowizard settings to minimize file size or generally for compatibility with mass calibration and optimization in MSFragger DDA searches? For reference, I used Proteowizard's built-in preset "PASEF MGF", except I changed output format to mzml and used zlib compression. The filesize increased from ~6GB to ~25 GB, which is not ideal for large experiments.
Todd
Hi Todd,
You need to convert it to mzML format because this is what PTM-Prophet supports. We have a tutorial here (https://fragpipe.nesvilab.org/docs/tutorial_convert.html): Convert Bruker timsTOF .d files section.
Since you only need it for identification and PTM localization, you can add a threshold peak filter
to keep, say, top 150 peaks, or remove peaks with intensities less than 1% of the base peek. Please note that the peak filtered file can't be used in quantification.
Best,
Fengchao
Hi Fengchao, Thank you for the link, I missed that section.
Unfortunately, I am interested in phospho LFQ, so the threshold peak filter will not be a long term solution for me. Though I did try the filter and can confirm it shrinks the mzml file sufficiently so that I no longer get a Java Out of Memory error.
I did note that changing the binary encoding precision from 64 to 32bit has a reasonable reduction in file size. Would there be any issues in FragPipe and 32bit encoded mzml?
In future releases, for timsTOF data processing, is there a possibility that FragPipe could integrate extraction/conversion steps using recent tools such as OpenTIMS or TIMSCONVERT?
Best, Todd
Hi Todd,
I am glad to hear that is works for you to some extent.
I did note that changing the binary encoding precision from 64 to 32bit has a reasonable reduction in file size. Would there be any issues in FragPipe and 32bit encoded mzml?
I think it should be OK.
In future releases, for timsTOF data processing, is there a possibility that FragPipe could integrate extraction/conversion steps using recent tools such as OpenTIMS or TIMSCONVERT?
MSFragger has the ability to load .d from PASEF. We will add a module to write mzML format in the future.
Best,
Fengchao
Thanks Fengchao, one last question. Do you know yet if this module will produce mzml encoded files with similarly large size as msconvert? And therefore having 64GB+ RAM will be the best solution for this particular use case?
I am afraid yes if you don't filter out any peaks. I suggest you get 64 GB+ RAM if you want to analyze PASEF data smoothly.
Best,
Fengchao
Since we might need to modify IonQuant regarding the site intensity, and there are also PASEF data related changes to make, I will keep this issue open as a remainder.
Best,
Fengchao
Ok that's great. Also, you have probably considered this, but another future workaround for phosphoLFQ with PASEF could be if there was a way to filter peaks in mzml generation and maintain acessibility of intensity data with IonQuant.
As an FYI, with a single 16GB ddaPASEF mzml file (only peak picking, no filtering) 110GB of RAM was used for successful completion of mass recalibration and parameter optimization.
Oh, BTW, if you are using the mzml from ddaPASEF, you should turn off mass calibration because the mass calibration does not support ddaPASEF in mzml format.
Best,
Fengchao
Ok I see, so even though I didn't get any errors, I shouldn't trust that the recalibrated data is accurate from mzml?
For PASEF mzml, correct.
Best,
Fengchao
I have followed up on the suggestion to disable mass recalibration for analyzing PASEF mzml and I used a computer that has 144GB of RAM. I made it farther through the analysis, but I received a Philosopher error during the PhilosopherFilter step. As a reminder, I want to perform phosphoLFQ with PTMProphet analysis, so I didn't perform filtering during conversion to mzml, but I used "Combine ion mobility scans" and "Peak picking". Would this have anything to do with the error (see attached log). log_2022-07-05_22-20-49.txt
As a reminder, I want to perform phosphoLFQ with PTMProphet analysis, so I didn't perform filtering during conversion to mzml, but I used "Combine ion mobility scans" and "Peak picking".
I don't think you should also add scan summing
for PASEF data. Please check the tutorial here: https://fragpipe.nesvilab.org/docs/tutorial_convert.html
As to the Philosopher error, Felipe @prvst can you take a look?
Thanks,
Fengchao
Thanks Fengchao, I have reviewed the tutorial but maybe I need clarification on "scan summing". From the tutorial, it says that if the scan summing Filter is added, then I can't perform MS1 quant. I would like to perform this, so I left scan summing off.
Do you think not using "scan summing" but keeping "Combine ion mobility scans" leads to an incorrect format?
Without "scan summing", there will be many many scans with low SNR. But if you want to do MS1 quant, you can't add "scan summing"... OK, as you can see, converting .d to mzML format is not a good idea.
Best,
Fengchao
I don’t think we even tested ms1 quant with mzML from Bruker, did we?
But the crash seem to be something else. We have seen this error before, it somehow keep coming back in philosopher
Get Outlook for iOShttps://aka.ms/o0ukef
From: Fengchao @.> Sent: Wednesday, July 6, 2022 8:17:28 PM To: Nesvilab/FragPipe @.> Cc: Nesvizhskii, Alexey @.>; Comment @.> Subject: Re: [Nesvilab/FragPipe] Phosphopeptide site localization and quantification question (Issue #746)
External Email - Use Caution
Without "scan summing", there will be many many scans with low SNR. But if you want to do MS1 quant, you can't add "scan summing"... OK, as you can see, converting .d to mzML format is not a good idea.
Best,
Fengchao
— Reply to this email directly, view it on GitHubhttps://github.com/Nesvilab/FragPipe/issues/746#issuecomment-1176879349, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AIIMM67QZTHBL47OZYKUGBTVSYOZRANCNFSM52H3EBTQ. You are receiving this because you commented.Message ID: @.***>
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues
Yes, agreed...it seems mzml provides compatibility with tools, but creates other issues. Perhaps for future phosphorylation analysis a diaPASEF experiment using DIA-NN for quant would be better.
I think Philosopher is complaining because you are trying to filter a PeptideProphet temporary file, which is partially written. You can see it here:
INFO[22:20:49] Executing Filter v4.4.0
INFO[22:20:49] Processing peptide identification files
INFO[22:20:49] Parsing F:\Todd\timTOF\Demo\20220610_SCR_Bleo_A_60_Slot2-23_1_668\interact-20220610_SCR_Bleo_A_60_Slot2-23_1_668.pep.xml.tmp.a25516
I normally advise cleaning the directory before running the programs again if you find any issues.
Hi Felipe, can philosopher detect and ignore those temp files?
Get Outlook for iOShttps://aka.ms/o0ukef
From: Felipe da Veiga Leprevost @.> Sent: Thursday, July 7, 2022 10:53:26 AM To: Nesvilab/FragPipe @.> Cc: Nesvizhskii, Alexey @.>; Comment @.> Subject: Re: [Nesvilab/FragPipe] Phosphopeptide site localization and quantification question (Issue #746)
External Email - Use Caution
I think Philosopher is complaining because you are trying to filter a PeptideProphet temporary file, which is partially written. You can see it here:
INFO[22:20:49] Executing Filter v4.4.0 INFO[22:20:49] Processing peptide identification files INFO[22:20:49] Parsing F:\Todd\timTOF\Demo\20220610_SCR_Bleo_A_60_Slot2-23_1_668\interact-20220610_SCR_Bleo_A_60_Slot2-23_1_668.pep.xml.tmp.a25516
I normally advise cleaning the directory before running the programs again if you find any issues.
— Reply to this email directly, view it on GitHubhttps://github.com/Nesvilab/FragPipe/issues/746#issuecomment-1177741599, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AIIMM6Y4OJFAJXIIJR53HXTVS3VONANCNFSM52H3EBTQ. You are receiving this because you commented.Message ID: @.***>
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues
They can be from a previous run or if one of the runs failed PeptideProphet for some reason. If running FragPipe, FragPipe should clean all previous pep.xml files. But we need to double check
Get Outlook for iOShttps://aka.ms/o0ukef
From: Felipe da Veiga Leprevost @.> Sent: Thursday, July 7, 2022 10:53:26 AM To: Nesvilab/FragPipe @.> Cc: Nesvizhskii, Alexey @.>; Comment @.> Subject: Re: [Nesvilab/FragPipe] Phosphopeptide site localization and quantification question (Issue #746)
External Email - Use Caution
I think Philosopher is complaining because you are trying to filter a PeptideProphet temporary file, which is partially written. You can see it here:
INFO[22:20:49] Executing Filter v4.4.0 INFO[22:20:49] Processing peptide identification files INFO[22:20:49] Parsing F:\Todd\timTOF\Demo\20220610_SCR_Bleo_A_60_Slot2-23_1_668\interact-20220610_SCR_Bleo_A_60_Slot2-23_1_668.pep.xml.tmp.a25516
I normally advise cleaning the directory before running the programs again if you find any issues.
— Reply to this email directly, view it on GitHubhttps://github.com/Nesvilab/FragPipe/issues/746#issuecomment-1177741599, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AIIMM6Y4OJFAJXIIJR53HXTVS3VONANCNFSM52H3EBTQ. You are receiving this because you commented.Message ID: @.***>
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues
Hi Felipe, thank you for the insights. Looking into the files more closely, the date stamp for the temp xml was after the pep.xml and also I had a second mzxml for analysis that produced both pep.xml and mod.pep.xml its the output directory, while the one with the temp file did not have the mod.pep.xml. So perhaps the original source of issue began with PTMProphet?
The two files are replicate injections and were converted to mzml at the same time, so I'm not sure why the mod.pep.xml failed in one file but not the other?