FragPipe icon indicating copy to clipboard operation
FragPipe copied to clipboard

Phosphopeptide site localization and quantification question

Open enonimos opened this issue 2 years ago • 24 comments

Hello, I am performing phosphoproteome-based closed searches with LFQ and have a question/clarification about the PTMProphet output. I appreciate the report format as it provides indexed and localized sites with quantification data. Based on the "Peptide" column, it appears that multiple modified peptides have been aggregated as long as they contains the "index" site. This index site may be a subset of sites if the peptide has multiple sites of modification (below example)

AGEEDEGEEDsDSDYEISAK;AGEEDEGEEDsDsDYEISAK

How are the quantitative values summarized when multiple peptides are listed? In the example above, if there is a summarization/aggregation step for quantitative values, what is the rationale for combining quant value for single and multi-site phosphorylations? My preference would be to interpret them separately. Is the PTM quantification strategy you have taken in TMT-Integrator which uses single versus multi-site quantification reports a solution to this?

Related to these analyses, I have timsTOF phosphopeptide data. Since PTMProphet does not support timsTOF data, is there a recommend approach for obtaining PTM site localization from timsTOF data?

Thanks again for your help, Todd

enonimos avatar Jun 30 '22 03:06 enonimos

Hmm, in TMT-Integrator we do not aggregate AGEEDEGEEDsDSDYEISAK;AGEEDEGEEDsDsDYEISAK

They would be separate entries in multi-site report, and AGEEDEGEEDsDSDYEISAK will be selected for the DsD site in the single site report.

IonQuant site-level reports were more recently introduced. Fengchao, can you describe what we do in the IonQuant phospho reports?

For PTMProphet on timsTOF, plan to write mzML in the future so PTM-Prophet will work. But for now you can try converting .d to mzML with Proteowizard first

Thanks

Alexey

From: enonimos @.> Sent: Wednesday, June 29, 2022 11:46 PM To: Nesvilab/FragPipe @.> Cc: Subscribed @.***> Subject: [Nesvilab/FragPipe] Phosphopeptide site localization and quantification question (Issue #746)

External Email - Use Caution

Hello, I am performing phosphoproteome-based closed searches with LFQ and have a question/clarification about the PTMProphet output. I appreciate the report format as it provides indexed and localized sites with quantification data. Based on the "Peptide" column, it appears that multiple modified peptides have been aggregated as long as they contains the "index" site. This index site may be a subset of sites if the peptide has multiple sites of modification (below example)

AGEEDEGEEDsDSDYEISAK;AGEEDEGEEDsDsDYEISAK

How are the quantitative values summarized when multiple peptides are listed? In the example above, if there is a summarization/aggregation step for quantitative values, what is the rationale for combining quant value for single and multi-site phosphorylations? My preference would be to interpret them separately. Is the PTM quantification strategy you have taken in TMT-Integrator which uses single versus multi-site quantification reports a solution to this?

Related to these analyses, I have timsTOF phosphopeptide data. Since PTMProphet does not support timsTOF data, is there a recommend approach for obtaining PTM site localization from timsTOF data?

Thanks again for your help, Todd

— Reply to this email directly, view it on GitHubhttps://github.com/Nesvilab/FragPipe/issues/746, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AIIMM62XXAXPFUTALYRJM2TVRUKAPANCNFSM52H3EBTQ. You are receiving this because you are subscribed to this thread.Message ID: @.@.>>


Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues

anesvi avatar Jun 30 '22 03:06 anesvi

Yes, we using the intensities from both AGEEDEGEEDsDSDYEISAK and AGEEDEGEEDsDsDYEISAK in calculating the intensity for the first phospho site. We use top-N and MaxLFQ algorithms to rolling the intensity. Maybe we should not using AGEEDEGEEDsDsDYEISAK's intensity for the first phospho site.

Best,

Fengchao

fcyu avatar Jun 30 '22 04:06 fcyu

The logic we have in generating single-site reports in TMT-I is that if there is a monophosphorylated peptides with that site localized, that is what is used (doubly phosphorylated discarded). Fengchao, we can discuss, I can share the schema we have for collapsing to single-site level in TMT-I

From: Fengchao @.> Sent: Thursday, June 30, 2022 12:20 AM To: Nesvilab/FragPipe @.> Cc: Nesvizhskii, Alexey @.>; Comment @.> Subject: Re: [Nesvilab/FragPipe] Phosphopeptide site localization and quantification question (Issue #746)

External Email - Use Caution

Yes, we using the intensities from both AGEEDEGEEDsDSDYEISAK and AGEEDEGEEDsDsDYEISAK in calculating the intensity for the first phospho site. We use top-N and MaxLFQ algorithms to rolling the intensity. Maybe we should not using AGEEDEGEEDsDsDYEISAK's intensity for the first phospho site.

Best,

Fengchao

— Reply to this email directly, view it on GitHubhttps://github.com/Nesvilab/FragPipe/issues/746#issuecomment-1170738201, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AIIMM6ZNNF64QUJUG7HCN6TVRUN6LANCNFSM52H3EBTQ. You are receiving this because you commented.Message ID: @.@.>>


Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues

anesvi avatar Jun 30 '22 04:06 anesvi

Hi Alexey and Fengchao, Thank you for the clarification and comparison to TMT-I. This strategy makes sense to me and I would support its inclusion as feature in IonQuant.

Regarding conversion of .d Bruker to mzml for PTMProphet, I used Proteowizard's MSConvert tool. I did a FragPipe test search with one file, but generated an error message, "Could not allocate arrays during spectra decoding step" during mass recalibration and parameter optimization (see log file attached). If I disable these, then the search completes successfully.

Is RAM an issue (my computer has 64GB of RAM)? Do you have any recommendations for different Proteowizard settings to minimize file size or generally for compatibility with mass calibration and optimization in MSFragger DDA searches? For reference, I used Proteowizard's built-in preset "PASEF MGF", except I changed output format to mzml and used zlib compression. The filesize increased from ~6GB to ~25 GB, which is not ideal for large experiments. image

log_2022-06-30_01-05-57.txt

Todd

enonimos avatar Jul 01 '22 12:07 enonimos

Hi Todd,

You need to convert it to mzML format because this is what PTM-Prophet supports. We have a tutorial here (https://fragpipe.nesvilab.org/docs/tutorial_convert.html): Convert Bruker timsTOF .d files section.

Since you only need it for identification and PTM localization, you can add a threshold peak filter to keep, say, top 150 peaks, or remove peaks with intensities less than 1% of the base peek. Please note that the peak filtered file can't be used in quantification.

Best,

Fengchao

fcyu avatar Jul 01 '22 14:07 fcyu

Hi Fengchao, Thank you for the link, I missed that section.

Unfortunately, I am interested in phospho LFQ, so the threshold peak filter will not be a long term solution for me. Though I did try the filter and can confirm it shrinks the mzml file sufficiently so that I no longer get a Java Out of Memory error.

I did note that changing the binary encoding precision from 64 to 32bit has a reasonable reduction in file size. Would there be any issues in FragPipe and 32bit encoded mzml?

In future releases, for timsTOF data processing, is there a possibility that FragPipe could integrate extraction/conversion steps using recent tools such as OpenTIMS or TIMSCONVERT?

Best, Todd

enonimos avatar Jul 05 '22 01:07 enonimos

Hi Todd,

I am glad to hear that is works for you to some extent.

I did note that changing the binary encoding precision from 64 to 32bit has a reasonable reduction in file size. Would there be any issues in FragPipe and 32bit encoded mzml?

I think it should be OK.

In future releases, for timsTOF data processing, is there a possibility that FragPipe could integrate extraction/conversion steps using recent tools such as OpenTIMS or TIMSCONVERT?

MSFragger has the ability to load .d from PASEF. We will add a module to write mzML format in the future.

Best,

Fengchao

fcyu avatar Jul 05 '22 01:07 fcyu

Thanks Fengchao, one last question. Do you know yet if this module will produce mzml encoded files with similarly large size as msconvert? And therefore having 64GB+ RAM will be the best solution for this particular use case?

enonimos avatar Jul 05 '22 01:07 enonimos

I am afraid yes if you don't filter out any peaks. I suggest you get 64 GB+ RAM if you want to analyze PASEF data smoothly.

Best,

Fengchao

fcyu avatar Jul 05 '22 02:07 fcyu

Since we might need to modify IonQuant regarding the site intensity, and there are also PASEF data related changes to make, I will keep this issue open as a remainder.

Best,

Fengchao

fcyu avatar Jul 05 '22 14:07 fcyu

Ok that's great. Also, you have probably considered this, but another future workaround for phosphoLFQ with PASEF could be if there was a way to filter peaks in mzml generation and maintain acessibility of intensity data with IonQuant.

As an FYI, with a single 16GB ddaPASEF mzml file (only peak picking, no filtering) 110GB of RAM was used for successful completion of mass recalibration and parameter optimization.

enonimos avatar Jul 05 '22 14:07 enonimos

Oh, BTW, if you are using the mzml from ddaPASEF, you should turn off mass calibration because the mass calibration does not support ddaPASEF in mzml format.

Best,

Fengchao

fcyu avatar Jul 05 '22 14:07 fcyu

Ok I see, so even though I didn't get any errors, I shouldn't trust that the recalibrated data is accurate from mzml?

enonimos avatar Jul 05 '22 15:07 enonimos

For PASEF mzml, correct.

Best,

Fengchao

fcyu avatar Jul 05 '22 15:07 fcyu

I have followed up on the suggestion to disable mass recalibration for analyzing PASEF mzml and I used a computer that has 144GB of RAM. I made it farther through the analysis, but I received a Philosopher error during the PhilosopherFilter step. As a reminder, I want to perform phosphoLFQ with PTMProphet analysis, so I didn't perform filtering during conversion to mzml, but I used "Combine ion mobility scans" and "Peak picking". Would this have anything to do with the error (see attached log). log_2022-07-05_22-20-49.txt

enonimos avatar Jul 06 '22 23:07 enonimos

As a reminder, I want to perform phosphoLFQ with PTMProphet analysis, so I didn't perform filtering during conversion to mzml, but I used "Combine ion mobility scans" and "Peak picking".

I don't think you should also add scan summing for PASEF data. Please check the tutorial here: https://fragpipe.nesvilab.org/docs/tutorial_convert.html

As to the Philosopher error, Felipe @prvst can you take a look?

Thanks,

Fengchao

fcyu avatar Jul 07 '22 00:07 fcyu

Thanks Fengchao, I have reviewed the tutorial but maybe I need clarification on "scan summing". From the tutorial, it says that if the scan summing Filter is added, then I can't perform MS1 quant. I would like to perform this, so I left scan summing off.
image Do you think not using "scan summing" but keeping "Combine ion mobility scans" leads to an incorrect format?

enonimos avatar Jul 07 '22 00:07 enonimos

Without "scan summing", there will be many many scans with low SNR. But if you want to do MS1 quant, you can't add "scan summing"... OK, as you can see, converting .d to mzML format is not a good idea.

Best,

Fengchao

fcyu avatar Jul 07 '22 00:07 fcyu

I don’t think we even tested ms1 quant with mzML from Bruker, did we?

But the crash seem to be something else. We have seen this error before, it somehow keep coming back in philosopher

Get Outlook for iOShttps://aka.ms/o0ukef


From: Fengchao @.> Sent: Wednesday, July 6, 2022 8:17:28 PM To: Nesvilab/FragPipe @.> Cc: Nesvizhskii, Alexey @.>; Comment @.> Subject: Re: [Nesvilab/FragPipe] Phosphopeptide site localization and quantification question (Issue #746)

External Email - Use Caution

Without "scan summing", there will be many many scans with low SNR. But if you want to do MS1 quant, you can't add "scan summing"... OK, as you can see, converting .d to mzML format is not a good idea.

Best,

Fengchao

— Reply to this email directly, view it on GitHubhttps://github.com/Nesvilab/FragPipe/issues/746#issuecomment-1176879349, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AIIMM67QZTHBL47OZYKUGBTVSYOZRANCNFSM52H3EBTQ. You are receiving this because you commented.Message ID: @.***>


Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues

anesvi avatar Jul 07 '22 00:07 anesvi

Yes, agreed...it seems mzml provides compatibility with tools, but creates other issues. Perhaps for future phosphorylation analysis a diaPASEF experiment using DIA-NN for quant would be better.

enonimos avatar Jul 07 '22 00:07 enonimos

I think Philosopher is complaining because you are trying to filter a PeptideProphet temporary file, which is partially written. You can see it here:

INFO[22:20:49] Executing Filter  v4.4.0                     
INFO[22:20:49] Processing peptide identification files      
INFO[22:20:49] Parsing F:\Todd\timTOF\Demo\20220610_SCR_Bleo_A_60_Slot2-23_1_668\interact-20220610_SCR_Bleo_A_60_Slot2-23_1_668.pep.xml.tmp.a25516 

I normally advise cleaning the directory before running the programs again if you find any issues.

prvst avatar Jul 07 '22 14:07 prvst

Hi Felipe, can philosopher detect and ignore those temp files?

Get Outlook for iOShttps://aka.ms/o0ukef


From: Felipe da Veiga Leprevost @.> Sent: Thursday, July 7, 2022 10:53:26 AM To: Nesvilab/FragPipe @.> Cc: Nesvizhskii, Alexey @.>; Comment @.> Subject: Re: [Nesvilab/FragPipe] Phosphopeptide site localization and quantification question (Issue #746)

External Email - Use Caution

I think Philosopher is complaining because you are trying to filter a PeptideProphet temporary file, which is partially written. You can see it here:

INFO[22:20:49] Executing Filter v4.4.0 INFO[22:20:49] Processing peptide identification files INFO[22:20:49] Parsing F:\Todd\timTOF\Demo\20220610_SCR_Bleo_A_60_Slot2-23_1_668\interact-20220610_SCR_Bleo_A_60_Slot2-23_1_668.pep.xml.tmp.a25516

I normally advise cleaning the directory before running the programs again if you find any issues.

— Reply to this email directly, view it on GitHubhttps://github.com/Nesvilab/FragPipe/issues/746#issuecomment-1177741599, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AIIMM6Y4OJFAJXIIJR53HXTVS3VONANCNFSM52H3EBTQ. You are receiving this because you commented.Message ID: @.***>


Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues

anesvi avatar Jul 07 '22 15:07 anesvi

They can be from a previous run or if one of the runs failed PeptideProphet for some reason. If running FragPipe, FragPipe should clean all previous pep.xml files. But we need to double check

Get Outlook for iOShttps://aka.ms/o0ukef


From: Felipe da Veiga Leprevost @.> Sent: Thursday, July 7, 2022 10:53:26 AM To: Nesvilab/FragPipe @.> Cc: Nesvizhskii, Alexey @.>; Comment @.> Subject: Re: [Nesvilab/FragPipe] Phosphopeptide site localization and quantification question (Issue #746)

External Email - Use Caution

I think Philosopher is complaining because you are trying to filter a PeptideProphet temporary file, which is partially written. You can see it here:

INFO[22:20:49] Executing Filter v4.4.0 INFO[22:20:49] Processing peptide identification files INFO[22:20:49] Parsing F:\Todd\timTOF\Demo\20220610_SCR_Bleo_A_60_Slot2-23_1_668\interact-20220610_SCR_Bleo_A_60_Slot2-23_1_668.pep.xml.tmp.a25516

I normally advise cleaning the directory before running the programs again if you find any issues.

— Reply to this email directly, view it on GitHubhttps://github.com/Nesvilab/FragPipe/issues/746#issuecomment-1177741599, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AIIMM6Y4OJFAJXIIJR53HXTVS3VONANCNFSM52H3EBTQ. You are receiving this because you commented.Message ID: @.***>


Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues

anesvi avatar Jul 07 '22 15:07 anesvi

Hi Felipe, thank you for the insights. Looking into the files more closely, the date stamp for the temp xml was after the pep.xml and also I had a second mzxml for analysis that produced both pep.xml and mod.pep.xml its the output directory, while the one with the temp file did not have the mod.pep.xml. So perhaps the original source of issue began with PTMProphet?

The two files are replicate injections and were converted to mzml at the same time, so I'm not sure why the mod.pep.xml failed in one file but not the other?

enonimos avatar Jul 07 '22 17:07 enonimos