FragPipe icon indicating copy to clipboard operation
FragPipe copied to clipboard

Wrong peptide start and end positions

Open hollenstein opened this issue 1 year ago • 1 comments

Dear FragPipe Team,

First of all, thank you very much for providing these fantastic tools. I am working at a proteomics MS facility and we are planning to switch to FragPipe for the bulk of our data analysis in the near future. I am currently looking into using FragPipe for several typical sample types we get in the facility, and how to further process the outputs generated by FragPipe. So I will probably have several questions in the future and maybe also some feature requests.

I’ve noticed a rare problem of the reported peptide Start and End positions: When looking up the positions of a peptide in the protein sequence from the FASTA file, in very few cases the Start and End positions reported by FragPipe don’t match the actual positions. It appears to be a problem caused by treating Isoleucine and Leucine as indistinguishable (at least in the cases I’ve observed).

For example, the reported peptide “GDSLDSVEALIK” matches exactly once to the protein sequence of “Q13813”, at position 1474 to 1485. However, FragPipe reports 496 to 507. When replacing all “L” with “I” in the peptide and the protein sequence, then the peptide matches to two positions within the protein: 496 to 507 (GDSLDSVEALLK), and 1474 to 1485 (GDSLDSVEALIK). I assume that when matching the peptide to the protein (treating “L” and “I” as indistinguishable), the first occurance of the peptide is reported by FragPipe. I think this behavior is fair enough, however, the actual sequence of the peptide in the ion.tsv and other tables is not updated and thus does not fit to the reported Start and End positions.

In the few cases I’ve seen, only the “original” peptide sequences that were reported are tryptic peptides, meaning the preceding amino acid were K/R, whereas the peptides from the reported start/end positions were only semi-tryptic peptides.

I’ve seen that you are planning to consider the cleavage rules when mapping peptides to proteins in an upcoming version (https://github.com/Nesvilab/FragPipe/issues/718), so this might already solve the problem in most if not all the cases. I still wanted to let you know, as this seems to be an unintentional behavior.

Best, David

Here is the protein sequence for the mentioned example

sp|Q13813|SPTN1_HUMAN Spectrin alpha chain, non-erythrocytic 1 OS=Homo sapiens OX=9606 GN=SPTAN1 PE=1 SV=3 MDPSGVKVLETAEDIQERRQQVLDRYHRFKELSTLRRQKLEDSYRFQFFQRDAEELEKWIQEKLQIASDENYKDPTNLQGKLQKHQAFEAEVQANSGAIVKLDETGNLMISEGHFASETIRTRLMELHRQWELLLEKMREKGIKLLQAQKLVQYLRECEDVMDWINDKEAIVTSEELGQDLEHVEVLQKKFEEFQTDMAAHEERVNEVNQFAAKLIQEQHPEEELIKTKQDEVNAAWQRLKGLALQRQGKLFGAAEVQRFNRDVDETISWIKEKEQLMASDDFGRDLASVQALLRKHEGLERDLAALEDKVKALCAEADRLQQSHPLSATQIQVKREELITNWEQIRTLAAERHARLNDSYRLQRFLADFRDLTSWVTEMKALINADELASDVAGAEALLDRHQEHKGEIDAHEDSFKSADESGQALLAAGHYASDEVREKLTVLSEERAALLELWELRRQQYEQCMDLQLFYRDTEQVDNWMSKQEAFLLNEDLGDSLDSVEALLKKHEDFEKSLSAQEEKITALDEFATKLIQNNHYAMEDVATRRDALLSRRNALHERAMRRRAQLADSFHLQQFFRDSDELKSWVNEKMKTATDEAYKDPSNLQGKVQKHQAFEAELSANQSRIDALEKAGQKLIDVNHYAKDEVAARMNEVISLWKKLLEATELKGIKLREANQQQQFNRNVEDIELWLYEVEGHLASDDYGKDLTNVQNLQKKHALLEADVAAHQDRIDGITIQARQFQDAGHFDAENIKKKQEALVARYEALKEPMVARKQKLADSLRLQQLFRDVEDEETWIREKEPIAASTNRGKDLIGVQNLLKKHQALQAEIAGHEPRIKAVTQKGNAMVEEGHFAAEDVKAKLHELNQKWEALKAKASQRRQDLEDSLQAQQYFADANEAESWMREKEPIVGSTDYGKDEDSAEALLKKHEALMSDLSAYGSSIQALREQAQSCRQQVAPTDDETGKELVLALYDYQEKSPREVTMKKGDILTLLNSTNKDWWKVEVNDRQGFVPAAYVKKLDPAQSASRENLLEEQGSIALRQEQIDNQTRITKEAGSVSLRMKQVEELYHSLLELGEKRKGMLEKSCKKFMLFREANELQQWINEKEAALTSEEVGADLEQVEVLQKKFDDFQKDLKANESRLKDINKVAEDLESEGLMAEEVQAVQQQEVYGMMPRDETDSKTASPWKSARLMVHTVATFNSIKELNERWRSLQQLAEERSQLLGSAHEVQRFHRDADETKEWIEEKNQALNTDNYGHDLASVQALQRKHEGFERDLAALGDKVNSLGETAERLIQSHPESAEDLQEKCTELNQAWSSLGKRADQRKAKLGDSHDLQRFLSDFRDLMSWINGIRGLVSSDELAKDVTGAEALLERHQEHRTEIDARAGTFQAFEQFGQQLLAHGHYASPEIKQKLDILDQERADLEKAWVQRRMMLDQCLELQLFHRDCEQAENWMAAREAFLNTEDKGDSLDSVEALIKKHEDFDKAINVQEEKIAALQAFADQLIAAGHYAKGDISSRRNEVLDRWRRLKAQMIEKRSKLGESQTLQQFSRDVDEIEAWISEKLQTASDESYKDPTNIQSKHQKHQAFEAELHANADRIRGVIDMGNSLIERGACAGSEDAVKARLAALADQWQFLVQKSAEKSQKLKEANKQQNFNTGIKDFDFWLSEVEALLASEDYGKDLASVNNLLKKHQLLEADISAHEDRLKDLNSQADSLMTSSAFDTSQVKDKRDTINGRFQKIKSMAASRRAKLNESHRLHQFFRDMDDEESWIKEKKLLVGSEDYGRDLTGVQNLRKKHKRLEAELAAHEPAIQGVLDTGKKLSDDNTIGKEEIQQRLAQFVEHWKELKQLAAARGQRLEESLEYQQFVANVEEEEAWINEKMTLVASEDYGDTLAAIQGLLKKHEAFETDFTVHKDRVNDVCTNGQDLIKKNNHHEENISSKMKGLNGKVSDLEKAAAQRKAKLDENSAFLQFNWKADVVESWIGEKENSLKTDDYGRDLSSVQTLLTKQETFDAGLQAFQQEGIANITALKDQLLAAKHVQSKAIEARHASLMKRWSQLLANSAARKKKLLEAQSHFRKVEDLFLTFAKKASAFNSWFENAEEDLTDPVRCNSLEEIKALREAHDAFRSSLSSAQADFNQLAELDRQIKSFRVASNPYTWFTMEALEETWRNLQKIIKERELELQKEQRRQEENDKLRQEFAQHANAFHQWIQETRTYLLDGSCMVEESGTLESQLEATKRKHQEIRAMRSQLKKIEDLGAAMEEALILDNKYTEHSTVGLAQQWDQLDQLGMRMQHNLEQQIQARNTTGVTEEALKEFSMMFKHFDKDKSGRLNHQEFKSCLRSLGYDLPMVEEGEPDPEFEAILDTVDPNRDGHVSLQEYMAFMISRETENVKSSEEIESAFRALSSEGKPYVTKEELYQNLTREQADYCVSHMKPYVDGKGRELPTAFDYVEFTRSLFVN

hollenstein avatar Jul 22 '22 14:07 hollenstein

Hi David,

Thanks for your feedback. I think it is also related to the issue https://github.com/Nesvilab/FragPipe/issues/500 . We will see what we can do.

Best,

Fengchao

fcyu avatar Jul 22 '22 14:07 fcyu