philosopher icon indicating copy to clipboard operation
philosopher copied to clipboard

SequenceWindow, Ile over-representation on the n-term side

Open pisistrato opened this issue 1 year ago • 5 comments

Hi,

I was inspecting the files generated in the tmt-report folder. I noticed a suspicious over-representation of Ile in the SequenceWindow on the n-term side, i.e. before the detected peptide sequence. Can you comment on how that is calculated? It might be real, but I was expecting a Leu to be over-represented...

pisistrato avatar Jul 18 '24 12:07 pisistrato

It just uses the sequence of the assigned protein in the fasta file.

Best,

Fengchao

fcyu avatar Jul 18 '24 12:07 fcyu

Since it was very strange, I checked the sequences manually, this is what I see

ProteinID SequenceWindow Start SequenceWindowFromFasta Fasta
A0A8I5KX85 TAPVQAPPAP 148 TAPVQAPPAP xxxxxMAETEERSLDNFFAKRDKKKKKERSNRAASAAGAAGSAGGSSGAAGAAGGGAGAGTRPGDGGTASAGAAGPGAATKAVTKDEDEWKELEQKEVDYSGLRVQAMQISEKEEDDNEKRQDPGDNWEEGGGGGGGMEKSSGPWNKTAPVQAPPAPVIVTETPEPAMTSGVYRPPGARLTTTRKTPQGPPEIYSDTQFPSLQSTAKHVESRKDKEMEKSFEVVRHKNRGRDEVSKNQALKLQLDNQYAVLENQKSSHSQYNxxxxx
A0A8I5KX85 AIKIQLDNQY 239 ALKLQLDNQY xxxxxMAETEERSLDNFFAKRDKKKKKERSNRAASAAGAAGSAGGSSGAAGAAGGGAGAGTRPGDGGTASAGAAGPGAATKAVTKDEDEWKELEQKEVDYSGLRVQAMQISEKEEDDNEKRQDPGDNWEEGGGGGGGMEKSSGPWNKTAPVQAPPAPVIVTETPEPAMTSGVYRPPGARLTTTRKTPQGPPEIYSDTQFPSLQSTAKHVESRKDKEMEKSFEVVRHKNRGRDEVSKNQALKLQLDNQYAVLENQKSSHSQYNxxxxx
A0A8I5KX85 NQAIKLQLDN 237 NQALKLQLDN xxxxxMAETEERSLDNFFAKRDKKKKKERSNRAASAAGAAGSAGGSSGAAGAAGGGAGAGTRPGDGGTASAGAAGPGAATKAVTKDEDEWKELEQKEVDYSGLRVQAMQISEKEEDDNEKRQDPGDNWEEGGGGGGGMEKSSGPWNKTAPVQAPPAPVIVTETPEPAMTSGVYRPPGARLTTTRKTPQGPPEIYSDTQFPSLQSTAKHVESRKDKEMEKSFEVVRHKNRGRDEVSKNQALKLQLDNQYAVLENQKSSHSQYNxxxxx
A0A8I5KX85 FPSIQSTAKH 199 FPSLQSTAKH xxxxxMAETEERSLDNFFAKRDKKKKKERSNRAASAAGAAGSAGGSSGAAGAAGGGAGAGTRPGDGGTASAGAAGPGAATKAVTKDEDEWKELEQKEVDYSGLRVQAMQISEKEEDDNEKRQDPGDNWEEGGGGGGGMEKSSGPWNKTAPVQAPPAPVIVTETPEPAMTSGVYRPPGARLTTTRKTPQGPPEIYSDTQFPSLQSTAKHVESRKDKEMEKSFEVVRHKNRGRDEVSKNQALKLQLDNQYAVLENQKSSHSQYNxxxxx

The first one is correct, the others are not. FYI, Start refers to the starting positon excluding the xxxxx

pisistrato avatar Jul 18 '24 17:07 pisistrato

The second one is also correct because there is a peptide ALKLQLDNQY in the protein. We don't distinguish I and L when mapping peptides to proteins because they have the identical mass.

Best,

Fengchao

fcyu avatar Jul 18 '24 19:07 fcyu

Indeed, ALKLQLDNQY would be right, but FragPipe reports AIKIQLDNQY. To me it seems that all L are converted to I in the fasta file used to create the SequenceWindow.

Case closed :)

On Thu, Jul 18, 2024, 21:59 Fengchao @.***> wrote:

The second one is also correct because there is a peptide ALKLQLDNQY in the protein. We don't distinguish I and L when mapping peptides to proteins because they have the identical mass.

Best,

Fengchao

— Reply to this email directly, view it on GitHub https://github.com/Nesvilab/philosopher/issues/500, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJC6DX6DFZRXZTCKT6S5T5LZNANATAVCNFSM6AAAAABLCRBL6GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMZXGQ3DANJQGI . You are receiving this because you authored the thread.Message ID: @.***>

pisistrato avatar Jul 18 '24 21:07 pisistrato

Yes, this is a known bug: https://github.com/Nesvilab/philosopher/issues/430

fcyu avatar Jul 18 '24 22:07 fcyu

Fixed in https://github.com/Nesvilab/philosopher/pull/517

Huge thanks to @hollenstein

Best,

Fengchao

fcyu avatar Dec 19 '24 01:12 fcyu