FragPipe icon indicating copy to clipboard operation
FragPipe copied to clipboard

Peptides Mapped to Wrong proteins

Open Skourtis opened this issue 2 years ago • 10 comments

The Peptides which are reported in the ion_label_quant.tsv, in the modified sequence, are mapped directly to fasta file based on sequence matching, but often that peptide couldn't have arisen from that protein (because there is no cleavage site e.g. R|K for trypsin before the peptide) in the specfic protein.

E.g. This peptide AELLAGR maps to all these isoforms in the ion_label_quant (and probably downstream) but as you can see from their fasta sequence, They all have a W which is non-tryptic before the sequence. In this case I suspect it arises from a rev_ sequence in the fasta file and then accidentally matches to real proteins.

This run was done with enzymatic trypsin cleavage, (not for example with semi_n_term cleavage), so the R and K before should be respected.

This leads to peptides assigned to more proteins than they should, changing the Razor rule. image image

Skourtis avatar Jun 14 '22 10:06 Skourtis

This also happens with post-translational modifications where a sequence n[42.0106]NGTSM[15.9949]ISLIIPPK, could only possibly arise from a protein which starts with this sequence (since only the [^ was allowed the n[42.0106] mod).
Again you can see that instead of only mapping to P62495_nterm_5204, because it is the n-term of the protein, and the other sequences don't have an R|K before the sequence, it maps to many of them. image

image image

Skourtis avatar Jun 14 '22 10:06 Skourtis

Thanks for your report. Are you using FragPipe 18.0 with Percolator enabled?

Thanks,

Fengchao

fcyu avatar Jun 14 '22 13:06 fcyu

I'm using Fragpipe v17.1 and for PSM I'm using PeptideProphet for Closed Search.

Skourtis avatar Jun 14 '22 14:06 Skourtis

Then, the peptide-protein mapping is from PeptideProphet. I am not sure if it is something that we can change/fix.

Best,

Fengchao

fcyu avatar Jun 14 '22 15:06 fcyu

Can you send me your .pepXML and pep.xml files?

prvst avatar Jun 14 '22 15:06 prvst

Thanks

prvst avatar Jun 14 '22 16:06 prvst

Refresh parser tool of PeptideProphet maps based on the sequence. I do not think it consider cleavage rules.

In MSFragger 3.5 we remap peptides ourselves so if used with percolator then you have get more accurate mapping, Fengchao?

Get Outlook for iOShttps://aka.ms/o0ukef


From: Savvas Kourtis @.> Sent: Tuesday, June 14, 2022 6:42:23 AM To: Nesvilab/FragPipe @.> Cc: Subscribed @.***> Subject: Re: [Nesvilab/FragPipe] Peptides Mapped to Wrong proteins (Issue #718)

External Email - Use Caution

This also happens with post-translational modifications where a sequence n[42.0106]NGTSM[15.9949]ISLIIPPK, could only possibly arise from a protein which starts with this sequence (since only the [^ was allowed the n[42.0106] mod). Again you can see that instead of only mapping to P62495_nterm_5204, because it is the n-term of the protein, and the other sequences don't have an R|K before the sequence, it maps to many of them. [image]https://user-images.githubusercontent.com/51754041/173559134-44fb8478-0981-4a24-8c4b-51bfdd7f1922.png

[image]https://user-images.githubusercontent.com/51754041/173558863-7d12dc38-dd19-4651-8580-19831d6e0ab7.png [image]https://user-images.githubusercontent.com/51754041/173559016-fb352dcd-3361-4823-a047-7f293b52246f.png

— Reply to this email directly, view it on GitHubhttps://github.com/Nesvilab/FragPipe/issues/718#issuecomment-1155016350, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AIIMM63KMKDU3XXFGU342BLVPBOY7ANCNFSM5YXF4V3Q. You are receiving this because you are subscribed to this thread.Message ID: @.***>


Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues

anesvi avatar Jun 14 '22 16:06 anesvi

From your first example, we can see that fragger mapped the peptide AELLAGR to the decoy protein Q15772:

<search_result>
<search_hit peptide="AELLAGR" massdiff="0.00103759765625" calc_neutral_pep_mass="738.4263" peptide_next_aa="C" num_missed_cleavages="0" num_tol_term="2" protein_descr="Striated muscle preferentially expressed protein kinase OS=Homo sapiens OX=9606 GN=SPEG PE=1 SV=4" num_tot_proteins="1" tot_num_ions="12" hit_rank="1" num_matched_ions="6" protein="rev_sp|Q15772|SPEG_HUMAN" peptide_prev_aa="R" is_rejected="0">
<modification_info modified_peptide="AELLAGR[166]">
<mod_aminoacid_mass mass="166.1094" position="7"/>
</modification_info>
<search_score name="hyperscore" value="13.407"/>
<search_score name="nextscore" value="11.721"/>
<search_score name="expect" value="1.057631e+00"/>
</search_hit>
</search_result>

PeptideProphet scored the mapping and added the real protein O15360 as an alternative to the assignement:

<spectrum_query start_scan="44773" uncalibrated_precursor_neutral_mass="738.42645" assumed_charge="2" spectrum="2020LD002_MAGU_002_03_50pto.44773.44773.2" end_scan="44773" index="7009" precursor_neutral_mass="738.4273" retention_time_sec="1375.9601211547852">
<search_result>
<search_hit peptide="AELLAGR" massdiff="0.00103759765625" calc_neutral_pep_mass="738.4263" peptide_next_aa="C" num_missed_cleavages="0" num_tol_term="2" protein_descr="Striated muscle preferentially expressed protein kinase OS=Homo sapiens OX=9606 GN=SPEG PE=1 SV=4" num_tot_proteins="2" tot_num_ions="12" hit_rank="1" num_matched_ions="6" protein="rev_sp|Q15772|SPEG_HUMAN" peptide_prev_aa="R" is_rejected="0">
<alternative_protein protein="sp|O15360|FANCA_HUMAN" protein_descr="Fanconi anemia group A protein OS=Homo sapiens OX=9606 GN=FANCA PE=1 SV=2" num_tol_term="1" peptide_prev_aa="W" peptide_next_aa="V"/>
<modification_info modified_peptide="AELLAGR[166]">
<mod_aminoacid_mass mass="166.1094" position="7"/>
</modification_info>
<search_score name="hyperscore" value="13.407"/>
<search_score name="nextscore" value="11.721"/>
<search_score name="expect" value="1.057631e+00"/>
<analysis_result analysis="peptideprophet">
<peptideprophet_result probability="0.4301" all_ntt_prob="(0.0000,0.0000,0.4301)">
<search_score_summary>
<parameter name="fval" value="0.9580"/>
<parameter name="ntt" value="2"/>
<parameter name="nmc" value="0"/>
<parameter name="massd" value="1.405"/>
<parameter name="isomassd" value="0"/>
</search_score_summary>
</peptideprophet_result>
</analysis_result>
</search_hit>
</search_result>
</spectrum_query>

The switch happened during the filtering step, where we perform something that we call 'protein promotion'. If a PSM maps to a decoy, and has a target entry as an alternative, we flip them, and the assignment becomes a "target" one. That is why you see O15360 in the report.

prvst avatar Jun 14 '22 16:06 prvst

We also don't consider the cleavage rules in MSFragger 3.5. I think we should add it in the next version.

Best,

Fengchao

Refresh parser tool of PeptideProphet maps based on the sequence. I do not think it consider cleavage rules. In MSFragger 3.5 we remap peptides ourselves so if used with percolator then you have get more accurate mapping, Fengchao?

fcyu avatar Jun 14 '22 17:06 fcyu

Hi Everyone!

thank you for your quick and clear responses! I've implemented a peptide-> protein remapping in my scripts while I wait for the next version! Some of the peptides will not map to any real proteins (which will mean that they were actually decoy) but from what I have seen this is extremely rare (10 peptide evidence out of 20000) so shouldn't have an impact on FDR.

Thanks!

Skourtis avatar Jun 16 '22 12:06 Skourtis

Fixed. Will be available in the next release.

Best,

Fengchao

fcyu avatar Sep 12 '22 19:09 fcyu