SequenceWindow, Ile over-representation on the n-term side
Hi,
I was inspecting the files generated in the tmt-report folder. I noticed a suspicious over-representation of Ile in the SequenceWindow on the n-term side, i.e. before the detected peptide sequence.
Can you comment on how that is calculated?
It might be real, but I was expecting a Leu to be over-represented...
It just uses the sequence of the assigned protein in the fasta file.
Best,
Fengchao
Since it was very strange, I checked the sequences manually, this is what I see
| ProteinID | SequenceWindow | Start | SequenceWindowFromFasta | Fasta |
|---|---|---|---|---|
| A0A8I5KX85 | TAPVQAPPAP | 148 | TAPVQAPPAP | xxxxxMAETEERSLDNFFAKRDKKKKKERSNRAASAAGAAGSAGGSSGAAGAAGGGAGAGTRPGDGGTASAGAAGPGAATKAVTKDEDEWKELEQKEVDYSGLRVQAMQISEKEEDDNEKRQDPGDNWEEGGGGGGGMEKSSGPWNKTAPVQAPPAPVIVTETPEPAMTSGVYRPPGARLTTTRKTPQGPPEIYSDTQFPSLQSTAKHVESRKDKEMEKSFEVVRHKNRGRDEVSKNQALKLQLDNQYAVLENQKSSHSQYNxxxxx |
| A0A8I5KX85 | AIKIQLDNQY | 239 | ALKLQLDNQY | xxxxxMAETEERSLDNFFAKRDKKKKKERSNRAASAAGAAGSAGGSSGAAGAAGGGAGAGTRPGDGGTASAGAAGPGAATKAVTKDEDEWKELEQKEVDYSGLRVQAMQISEKEEDDNEKRQDPGDNWEEGGGGGGGMEKSSGPWNKTAPVQAPPAPVIVTETPEPAMTSGVYRPPGARLTTTRKTPQGPPEIYSDTQFPSLQSTAKHVESRKDKEMEKSFEVVRHKNRGRDEVSKNQALKLQLDNQYAVLENQKSSHSQYNxxxxx |
| A0A8I5KX85 | NQAIKLQLDN | 237 | NQALKLQLDN | xxxxxMAETEERSLDNFFAKRDKKKKKERSNRAASAAGAAGSAGGSSGAAGAAGGGAGAGTRPGDGGTASAGAAGPGAATKAVTKDEDEWKELEQKEVDYSGLRVQAMQISEKEEDDNEKRQDPGDNWEEGGGGGGGMEKSSGPWNKTAPVQAPPAPVIVTETPEPAMTSGVYRPPGARLTTTRKTPQGPPEIYSDTQFPSLQSTAKHVESRKDKEMEKSFEVVRHKNRGRDEVSKNQALKLQLDNQYAVLENQKSSHSQYNxxxxx |
| A0A8I5KX85 | FPSIQSTAKH | 199 | FPSLQSTAKH | xxxxxMAETEERSLDNFFAKRDKKKKKERSNRAASAAGAAGSAGGSSGAAGAAGGGAGAGTRPGDGGTASAGAAGPGAATKAVTKDEDEWKELEQKEVDYSGLRVQAMQISEKEEDDNEKRQDPGDNWEEGGGGGGGMEKSSGPWNKTAPVQAPPAPVIVTETPEPAMTSGVYRPPGARLTTTRKTPQGPPEIYSDTQFPSLQSTAKHVESRKDKEMEKSFEVVRHKNRGRDEVSKNQALKLQLDNQYAVLENQKSSHSQYNxxxxx |
The first one is correct, the others are not.
FYI, Start refers to the starting positon excluding the xxxxx
The second one is also correct because there is a peptide ALKLQLDNQY in the protein. We don't distinguish I and L when mapping peptides to proteins because they have the identical mass.
Best,
Fengchao
Indeed, ALKLQLDNQY would be right, but FragPipe reports AIKIQLDNQY. To
me it seems that all L are converted to I in the fasta file used to
create the SequenceWindow.
Case closed :)
On Thu, Jul 18, 2024, 21:59 Fengchao @.***> wrote:
The second one is also correct because there is a peptide ALKLQLDNQY in the protein. We don't distinguish I and L when mapping peptides to proteins because they have the identical mass.
Best,
Fengchao
— Reply to this email directly, view it on GitHub https://github.com/Nesvilab/philosopher/issues/500, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJC6DX6DFZRXZTCKT6S5T5LZNANATAVCNFSM6AAAAABLCRBL6GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMZXGQ3DANJQGI . You are receiving this because you authored the thread.Message ID: @.***>
Yes, this is a known bug: https://github.com/Nesvilab/philosopher/issues/430
Fixed in https://github.com/Nesvilab/philosopher/pull/517
Huge thanks to @hollenstein
Best,
Fengchao