FragPipe icon indicating copy to clipboard operation
FragPipe copied to clipboard

Mismatch combined_protein.tsv and reprint files

Open magnusarntzen opened this issue 2 years ago • 12 comments

Hi, I don't know if this is by design or not, but I've observed differences between all 3 files, combined_protein.tsv, reprint.int.tsv and reprint.spc.tsv.

Although the vast majority is similar, some proteins are listed in reprint.int that are not present in combined_protein.tsv, and also for spc.tsv there are some diffs to combined_protein.tsv. Looking at the time stamps, I see that scp.tsv is created first, then after Ionquant, the other two are generated. According to my log-file, it seems if some re-arrangement of protein inference occured at this stage, possibly leading to (some of) the descrepance.

NB: This was from a large metaproteomics data set using unbinned genomic data with a protein fasta of 4 millon entries. I had to use 16 splits to keep it under 120 Gb RAM. I don't know if that was the reason for this. When using a small database, I observe no differences between the 3 files.

KR, Magnus Arntzen, Norway.

magnusarntzen avatar Aug 31 '21 13:08 magnusarntzen

Sounds like a bug in Philosopher. Felipe @prvst , can you take a look when you have time?

Best,

Fengchao

fcyu avatar Aug 31 '21 14:08 fcyu

Are you using the latest philosopher?

But even with an earlier version, there should not be any proteins in reprint.int that are not present in combined_protein.tsv.

I assume you are running on multiple experiments? Can you pass to us PSM.tsv, protein.tsv from one experiment, and combined_prot.tsv And an example of a protein you see in reprint_int.tsv that is not in the combined file (but is shown with non-zero intensity for that experiment in reprint_int.tsv)

Thanks Alexey

From: Fengchao @.> Sent: Tuesday, August 31, 2021 10:26 AM To: Nesvilab/FragPipe @.> Cc: Subscribed @.***> Subject: Re: [Nesvilab/FragPipe] Mismatch combined_protein.tsv and reprint files (#450)

External Email - Use Caution

Sounds like a bug in Philosopher. Felipe @prvsthttps://github.com/prvst, can you take a look when you have time?

Best,

Fengchao

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/Nesvilab/FragPipe/issues/450#issuecomment-909288507, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AIIMM63BZLAJG5OZOVMANDDT7TRATANCNFSM5DEFY6HQ. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.


Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues

anesvi avatar Aug 31 '21 14:08 anesvi

Hi, this was analyzed using Fragpipe v15.0 on Windows with Philosopher 3.4.13. Please see the attached file. 210831 FP mismatch.zip

An example protein present in reprint.int and not in combined_protein.tsv is NBFHGDGN_964942. You can also see that the following lines in reprint.int (964945-47) are also not present in combined_protein.tsv and that the quantification is almost identical to 964942. Of these proteins, only 964947 is present in reprint.spc. Based on the very similar quantification, could it be a protein inference problem where the razor protein is re-selected?

KR, Magnus

magnusarntzen avatar Sep 01 '21 06:09 magnusarntzen

We fixed several issues with the reports in philosopher 4. I know we fixed some spectral count mistmatches. I hope the protein inference too.

I think we need to ask you to rerun with FragPipe 16 and philosopher 4.

Sent from my iPhone

On Sep 1, 2021, at 2:46 AM, magnusarntzen @.***> wrote:

 External Email - Use Caution

Hi, this was analyzed using Fragpipe v15.0 on Windows with Philosopher 3.4.13. Please see the attached file. 210831 FP mismatch.ziphttps://github.com/Nesvilab/FragPipe/files/7088759/210831.FP.mismatch.zip

An example protein present in reprint.int and not in combined_protein.tsv is NBFHGDGN_964942. You can also see that the following lines in reprint.int (964945-47) are also not present in combined_protein.tsv and that the quantification is almost identical to 964942. Of these proteins, only 964947 is present in reprint.spc. Based on the very similar quantification, could it be a protein inference problem where the razor protein is re-selected?

KR, Magnus

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/Nesvilab/FragPipe/issues/450#issuecomment-909963508, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AIIMM64K6YY5E6JEUNYZXSDT7XD3RANCNFSM5DEFY6HQ. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.


Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues

anesvi avatar Sep 01 '21 10:09 anesvi

Allright, I'll upgrade for next time around but I won't have time to rerun the data with the new version now - it takes 4.5 days of analysis at a (heavily booked) high-end computer. Can I still trust the combined_protein.tsv to contain correct information despite the diffs to reprint.int?

magnusarntzen avatar Sep 01 '21 10:09 magnusarntzen

You don’t need to rerun everything. Just the FDR filtering, reporting, and IonQuant (if you are using it).

Most of the entries in the tables are still correct in the old version.

Best,

Fengchao

On Wed, 1 Sep 2021 at 6:53 AM, magnusarntzen @.***> wrote:

Allright, I'll upgrade for next time around but I won't have time to rerun the data with the new version now - it takes 4.5 days of analysis at a (heavily booked) high-end computer. Can I still trust the combined_protein.tsv to contain correct information despite the diffs to reprint.int?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Nesvilab/FragPipe/issues/450#issuecomment-910166445, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABU27W35WLPJVYLDETH3UOLT7YAYVANCNFSM5DEFY6HQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

-- Dr. Fengchao Yu Research Investigator University of Michigan

fcyu avatar Sep 01 '21 12:09 fcyu

Is it possible to run only parts of the pipeline, like in MaxQuant partial run? Please let me know how to do that.

magnusarntzen avatar Sep 01 '21 12:09 magnusarntzen

Yes, you just need to uncheck some of the steps in FragPipe. For example, unchecking MSFragger, PeptideProphet, and ProteinProphet to skip them.

But you need to make sure that the running order of unchecked steps are not behind the checked steps.

Best,

Fengchao

On Wed, 1 Sep 2021 at 8:45 AM, magnusarntzen @.***> wrote:

Is it possible to run only parts of the pipeline, like in MaxQuant partial run? Please let me know how to do that.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Nesvilab/FragPipe/issues/450#issuecomment-910248332, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABU27W7SPZCJFNGCW2V6OLDT7YN43ANCNFSM5DEFY6HQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

-- Dr. Fengchao Yu Research Investigator University of Michigan

fcyu avatar Sep 01 '21 12:09 fcyu

Hi again, I reran the data as suggested with filtering, reporting and quant using Fragpipe v16 with Philosopher 4. The output is better, I think, with clear differences to previous version. Protein NBFHGDGN_964942 is no longer present in the output files, nor are the following proteins 964945-47 that showed almost identical quantification (see thread above). So I guess this is good and that the protein inference is improved.

However, there are still some proteins present in reprint.int which are not in combined_protein.tsv, for example protein NBFHGDGN_87638. reanalysis FP_v16.zip

Hope this helps narrowing down the problem. KR, Magnus

magnusarntzen avatar Sep 06 '21 09:09 magnusarntzen

Dear Magnus,

We looked into this. I think there is an issue in the philosopher protein inference step. In sort, protein inference should be done once, using all experiments combined, and same peptide-protein razor assignment applied to each experiment. However, philosopher repeats this step when filtering each individual experiment (but using the same combined data). So, in some rare case, philosopher break ties in a different way when deciding between two proteins when selecting razor. I guess there is some randomness in the sorting function… So even though the same data is used, a different protein may be assigned as razor in different experiments. I think if you want to be safe and consistent, just remove all entries that are in reprint_int.tsv but not combined_prot.tsv. This is the best I can suggest now. In the long run I hope Felipe will fix it.

Best Alexey

From: magnusarntzen @.> Sent: Monday, September 6, 2021 5:39 AM To: Nesvilab/FragPipe @.> Cc: Nesvizhskii, Alexey @.>; Comment @.> Subject: Re: [Nesvilab/FragPipe] Mismatch combined_protein.tsv and reprint files (#450)

External Email - Use Caution

Hi again, I reran the data as suggested with filtering, reporting and quant using Fragpipe v16 with Philosopher 4. The output is better, I think, with clear differences to previous version. Protein NBFHGDGN_964942 is no longer present in the output files, nor are the following proteins 964945-47 that showed almost identical quantification (see thread above). So I guess this is good and that the protein inference is improved.

However, there are still some proteins present in reprint.int which are not in combined_protein.tsv, for example protein NBFHGDGN_87638. reanalysis FP_v16.ziphttps://github.com/Nesvilab/FragPipe/files/7114905/reanalysis.FP_v16.zip

Hope this helps narrowing down the problem. KR, Magnus

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/Nesvilab/FragPipe/issues/450#issuecomment-913503451, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AIIMM6ZDZXPTG2WC7ZA37MLUASD4LANCNFSM5DEFY6HQ. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.


Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues

anesvi avatar Sep 06 '21 18:09 anesvi

Hi Alexey. Philosopher does one round of assignment, even if you have multiple experiments. The fix you mentioned needs to be done in the FragPipe GUI as we discussed last week.

prvst avatar Sep 06 '21 19:09 prvst

Thank you for your help - I will filter as instructed for now!

Best regards!!

magnusarntzen avatar Sep 07 '21 10:09 magnusarntzen