FragPipe icon indicating copy to clipboard operation
FragPipe copied to clipboard

Mismatched data in MSFRagger results files

Open treponeme opened this issue 2 years ago • 18 comments

Hello

I have been analyzing MS data after using FragPipe version 16.0/MSFragger version 3.3/Philosopher version 4.0.0 (build 1626989421) and have noticed some mis-matching data in the results files as listed below:

  1. The number of unique peptides for several proteins reported in the protein.tsv files differ from the number of unique peptides listed in the corresponding psm.tsv files. For example, TPANIC_0486_TPANIC_RS02360_hypothetical protein_WP_012460559_05072013 is reported to have 4 unique peptides in sample 256 protein.tsv file but only 3 unique peptides are listed in the sample 256 psm.tsv file. Another example is TPANIC_RS00150_TPANIC_0030_chaperonin which has 11 unique peptides in the sample 256 protein.tsv file but only 8 unique peptides in the sample 256 psm.tsv file. I have also seen examples of the reverse where fewer peptides are reported in the protein.tsv file compared to the corresponding psm.tsv file.

  2. Protein probabilities for each detected protein reported in the combined results file (combined_protein.tsv) are identical to the protein probabilities reported for all corresponding detected proteins in the three individual sample files (samples 256, 257, 258; protein.tsv files), so only one protein probability value exists for each protein in all four files.

I would greatly appreciate it if you could look into these data mismatches / discrepancies - I have attached the zipped log file for the three samples (samples 256, 257, and 258) and the combined data.

log_2021-07-30_07-16-48.zip

Thank you for your help with this matter

treponeme avatar Oct 26 '21 16:10 treponeme

Felipe @prvst , can you take a look when you have time?

Thanks,

Fengchao

fcyu avatar Oct 26 '21 16:10 fcyu

Hello

I have noticed one other issue with the data in this conversation (same zipped log file as above: https://github.com/Nesvilab/FragPipe/files/7419927/log_2021-07-30_07-16-48.zip):

Many proteins that were detected at high probabilities (>0.95) in the protein.tsv files of each of the individual samples (samples 256, 257, 258) are listed with values equal to zero for "Total Peptides", "Unique Peptides", "Razor peptides", “Total Spectral Count”, "Unique Spectral Count" and "Razor Spectral Count". For example, in the protein.tsv file for sample 256, the protein "TPANIC_0453_TPANIC_RS02210_membrane" is assigned with a protein probability=1, however zero total peptides, zero unique peptides, zero razor peptides, zero total spectral counts, zero unique spectral counts, and zero razor spectral counts are reported. Can these proteins still be considered high confidence proteins that were detected?

Thank you again for your help with these matters.

treponeme avatar Nov 04 '21 18:11 treponeme

Hi @treponeme, I'll take a look at your log files. This is a label-free, multi-experiment analysis, is that correct?

prvst avatar Nov 04 '21 19:11 prvst

Hi @treponeme ,

Can you re-run your analysis with the latest FragPipe (17.0) and the latest Philosopher (4.1.0)? You can skip MSFragger and Percolator by unchecking them.

If the issue is still there, can you send us your fasta file?

Thanks,

Fengchao

fcyu avatar Nov 04 '21 19:11 fcyu

?Hi Felipe

Yes, this was a label-free experiment consisting of three separate samples.

Thanks


From: Felipe da Veiga Leprevost @.***> Sent: November 4, 2021 12:00 PM To: Nesvilab/FragPipe Cc: Simon Houston; Mention Subject: Re: [Nesvilab/FragPipe] Mismatched data in MSFRagger results files (Issue #502)

Hi @treponemehttps://github.com/treponeme, I'll take a look at your log files. This is a label-free, multi-experiment analysis, is that correct?

You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/Nesvilab/FragPipe/issues/502#issuecomment-961332376, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AWGWYRJ7TDHYT3HRJHZ3EZTUKLQ7JANCNFSM5GYIYJUQ. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

treponeme avatar Nov 04 '21 19:11 treponeme

Hello Fengchao

The analyses have been re-run using the latest software, but the issues still persist.

I have attached the fasta file database used for searching here.

Thanks. Tp_Cust_Rab_Crap.zip

treponeme avatar Nov 05 '21 18:11 treponeme

Thanks for your testing and fasta file.

Hi Felipe @prvst ,

Can you take a look at the fasta file? There are proteins from UniProt and non-UniProt. Maybe that's the cause.

>TPANIC_0052_YP_008090927_HemK family methyltransferase_05072013
MYCVSRECNLVQELCTIRQARMYARALFQDAPCLRGQNTPLLDADLILSKLLAKPRAWILAHQQDEIASVAHEFKRLVHLRCRGRALAYLTREKEFFGLRFRVTRATLIPKPDTELLVESVLAHVASQMMKPRSVSVHKDTSALPVLKIFEACTGCGCIAIALMHMLRARGTPPLYVIASDICMRALAVARYNARRLLDVSANSRVRFVHADVRAPIPFFSPSEGTDVVQERGVCVPYDVICANPPYVPSAQARALLQDGRGEPLGALDGGADGLDLVRAFAHHSAAALKEGGCVFCEVGSNHAQRAARIFQAAGFATVKISKDLSGKERLISGILRSQSRAVTAPSG

>TPANIC_0591_YP_008091429_bifunctional Hpr kinase phosphatase_05072013
MLKLDLKERDSLDLRCIAGHHGLANPITISDLNRPGLVLSGFFDLFAYRRIQLFGRGEHAYLLALLEQGRYGAIEKMFTFDLPCCIFSHGITPPEKFLHLAEPSSCPILVTRLTSSELSLRLMRVLSNIFAPTIALHGVLVEVYGVGILISGDSGVGKSETALELIERGHRLVADDLVEISCVNGNSLIGRGVHKSIGHHMEIRGLGIINITQLYGVGSIRERKEIQMVVQLEEWNSSKAYDRLGTQELNTTILDVSVPLIEIPVRPGRNIPIILETAAMNERLKRMGYFSAKEFNQSVLKLMEQNAAHAPYYRPDDTY

>TPANIC_0773_YP_008091601_S1 family peptidase Do_05072013
MRNKVRVLAVVAALAAACAVGFFLGRWFDFSARSSVLEAADSLSVSSSEAASFSTVVAEGDPYTVDERQNIAVYRSANEAVVNITTEMVGVNWFLEPVPLEGGSGSGAIIDARGYVLTNTHVIEGASKIYLSLHDGSQYKATVVGVDRENDLAVLKFVSPPGARLTVIRFGSSRNLDVGQKVLAIGNPFGLARTLTVGVVSALARPIQNKGSIIRNMIQTDAAINPGNSGGPLLDTQGRMIGINTVIYSTSGSSSGVGFAVPVDTAKRIVSELIRYGRVRRGKIDAELVQVNASIAHYAQLTVGKGLLVSQVKRGSPAAQAGLRGGTTAVRYGLGRRAAVIYLGGDVITAIDNQPVANLSDYYSVLEDKKPDDEVRVTVLRGRRQHVVAVRLTERSDE

>TPANIC_0841_YP_008091666_S1 family peptidase Do_05072013
MPSADTIARRVAGDSGNAGGRTLLPVGVSRESVQLLERLQNANRQVTAEVLPSVVTLDVVETRKVRVRDPFGGFPWFFFRGPEGPGAGPGGGSGNKGEAEEREYKTEGLGSGVIVKKTGKTHYVLTNYHVAGKANEIEIKLHDGRIVKGKLVGGDQRKDIALVSFEDADPNIRVAVLGDSDAVRVGDIVFAVGSPLGYTSTVTQGIISALGRFGGPGNNINDFIQTDAAINQGNSGGPMVNIYGEVIGINAWIASSSGGSQGIGFSIPINNVKSDIESFIQYGQVKYGWLGVQLVATDADTVASLGIAKGTKGVLAAEIFLGSPAHKGGLKPGDYCVKLNGKEVKDVNQFVRDVGALRIGQTAVFDLIRGGVPMTLSVRITERDEKIVNDYSKLWPGFIPLPLTEAVRKRLDLKASVRGVLVSNAQSKSPAALMGLKSADIVVAVNDQRVSSVREFYAVLARQTREVWFDVLRDGQTLSTVRFRF

Thanks,

Fengchao

fcyu avatar Nov 05 '21 18:11 fcyu

The database seems OK. @treponeme can you share your output tables?

prvst avatar Nov 08 '21 20:11 prvst

?Hi

Here are the output tables. I will send the psm files separately.

Thanks


Simon Houston Ph.D. Research Associate Department of Biochemistry and Microbiology University of Victoria British Columbia


From: Felipe da Veiga Leprevost @.***> Sent: November 8, 2021 12:52 PM To: Nesvilab/FragPipe Cc: Simon Houston; Mention Subject: Re: [Nesvilab/FragPipe] Mismatched data in MSFRagger results files (Issue #502)

The database seems OK. @treponemehttps://github.com/treponeme can you share your output tables?

You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/Nesvilab/FragPipe/issues/502#issuecomment-963562538, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AWGWYRJRHM6ZRWKRLDWSX4LULA2AZANCNFSM5GYIYJUQ. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

treponeme avatar Nov 08 '21 23:11 treponeme

Hi

Here are the output tables.

Thanks Output_tables samples 256 257 258 and Combined.zip

treponeme avatar Nov 08 '21 23:11 treponeme

Hi Felipe@prvst

Any luck with the output files?

Thanks.

treponeme avatar Nov 25 '21 17:11 treponeme

Hi @treponeme, I see 4 (TPANIC_0486_TPANIC_RS02360_hypothetical protein_WP_012460559_05072013) PSMs, 4 peptides, and the numbers match to the 256 protein table. I thought that it was resolved, did I miss something?

prvst avatar Nov 25 '21 21:11 prvst

Hi Felipe@prvst

Unfortunately the three issues described above in my first two comments have not been sorted out, namely -

  1. There are many cases where the number of unique peptides for several proteins reported in the protein.tsv files differ from the number of unique peptides listed in the corresponding psm.tsv files. This problem still exists for many proteins.
  2. Protein probabilities for each detected protein in the combined results file and in the three individual sample files (samples 256, 257, 258; protein.tsv files; see attached files in my first comment above) are all identical, so only one protein probability value exists for each protein in all four sample files. Protein probabilities for the same proteins detected in the three different samples should be somewhat different between the three different samples.
  3. Many proteins that were detected at high probabilities (>0.95) in the protein.tsv files (see attached files in my first comment above) of each of the individual samples (samples 256, 257, 258) are listed with values equal to zero for "Total Peptides", "Unique Peptides", "Razor peptides", “Total Spectral Count”, "Unique Spectral Count" and "Razor Spectral Count". My question is, how can these proteins be considered high probability if no peptides or spectra were found/reported?

Thank you.

treponeme avatar Nov 25 '21 21:11 treponeme

Are you using the latest releases?

prvst avatar Nov 25 '21 21:11 prvst

To confirm, what are the latest releases? I believe I may have used the version before the latest.

treponeme avatar Nov 25 '21 21:11 treponeme

Philosopher 4.1.1, and Fragpipe 17.1

prvst avatar Nov 25 '21 21:11 prvst

No. I used 4.1 and Fragpipe 17. Do these new versions eliminate all three problems I listed above today?

treponeme avatar Nov 25 '21 21:11 treponeme

Hard to say since your issue seems to be particular to your case, some changes might affect the final reporting. Nevertheless, we need to debug the latest version since that's the current codebase we have now. I suggest you run the pipeline again, making sure all files are in place, no temporary files are present, and that your workspace is clean. If the problem you see persist, you can send me some of your data, and I'll debug the processing myself on Monday.

prvst avatar Nov 25 '21 22:11 prvst