FragPipe
FragPipe copied to clipboard
High confident protein did not make to the library
Hi,
I am using the DIA_SpecLib pipeline on Fragpipe to generate a library from 150 DDA files. Previously, using 10 files from the DDA files using both the DDA_LFQ and the library generation pipelines showed that one of the high abundant protein (TGM4) is present in all 10 files, and that there is at least 50 peptides mapped to this protein. However, when I generated the library using the entire cohort, TGM4 is not included in the output files, with all the peptides from TGM4 missing. From searching through the intermediate files, this protein is present in the combine.prot.xml file, with probability of 1.00.
The log file has been attached, alongside with the subset of combine.prot.xml file where TGM4 was recorded. I have checked the protein.fas, psm.tsv, ion.tsv, peptide.tsv, protein.tsv files, and none of them have TGM4 in there. Please take a look and let me know if it's something with my library parameters or if anything else is going on.
Thank you! Annie
That is very strange … Do you see any of the peptides from that protein in PSM.tsv?
From: Annie Ha @.> Sent: Wednesday, October 12, 2022 1:44 PM To: Nesvilab/FragPipe @.> Cc: Subscribed @.***> Subject: [Nesvilab/FragPipe] High confident protein did not make to the library (Issue #854)
External Email - Use Caution
Hi,
I am using the DIA_SpecLib pipeline on Fragpipe to generate a library from 150 DDA files. Previously, using 10 files from the DDA files using both the DDA_LFQ and the library generation pipelines showed that one of the high abundant protein (TGM4) is present in all 10 files, and that there is at least 50 peptides mapped to this protein. However, when I generated the library using the entire cohort, TGM4 is not included in the output files, with all the peptides from TGM4 missing. From searching through the intermediate files, this protein is present in the combine.prot.xml file, with probability of 1.00.
The log file has been attached, alongside with the subset of combine.prot.xml file where TGM4 was recorded. I have checked the protein.fas, psm.tsv, ion.tsv, peptide.tsv, protein.tsv files, and none of them have TGM4 in there. Please take a look and let me know if it's something with my library parameters or if anything else is going on.
Thank you! Annie
log_2022-10-11_21-09-52.txthttps://github.com/Nesvilab/FragPipe/files/9767419/log_2022-10-11_21-09-52.txt tgm4.txthttps://github.com/Nesvilab/FragPipe/files/9767452/tgm4.txt
— Reply to this email directly, view it on GitHubhttps://github.com/Nesvilab/FragPipe/issues/854, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AIIMM66S3QQ22MAD6LKFPRDWC32GPANCNFSM6AAAAAARDQL5T4. You are receiving this because you are subscribed to this thread.Message ID: @.@.>>
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues
None of the peptides from that protein were present in the psm.tsv. It almost looked like this protein was just completely filtered out...
Since this is Philosopher related, we will need Felipe’s help to investigate
Perhaps you can share the pep.xml files and the sequence database with us so he can investigate. We will keep confidential of course
Thanks Alexey
From: Annie Ha @.> Sent: Wednesday, October 12, 2022 1:55 PM To: Nesvilab/FragPipe @.> Cc: Nesvizhskii, Alexey @.>; Comment @.> Subject: Re: [Nesvilab/FragPipe] High confident protein did not make to the library (Issue #854)
External Email - Use Caution
None of the peptides from that protein were present in the psm.tsv. It almost looked like this protein was just completely filtered out...
— Reply to this email directly, view it on GitHubhttps://github.com/Nesvilab/FragPipe/issues/854#issuecomment-1276538566, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AIIMM66SOXKM6CQ335J5WZ3WC33Q3ANCNFSM6AAAAAARDQL5T4. You are receiving this because you commented.Message ID: @.@.>>
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues
Hi @klannieha. Please send me your interact*.pep.xml, combined.prot.xml, and your database files.
I have sent the data to your email with a google drive link!
Can you explain to me how do you generate your decoys?
The decoys was generated using the TPP tool decoyFastaGenerator.pl . The same fasta file was used to analyze the DDA files, and that the high abundant proteins were there. I also just re-ran the pipeline with 60 files from the cohort, and that TGM4 appeared in the results.
I suggest you use our database generation method, if possible. So, the first trial you made was with a subset of the files?
I will give it a try. The first trial was with the entire cohort, and then i took a subset of 10 files from the cohort to perform DDA search and the library generation, then I tried again with the subset of 60 files. For both subsets, the protein was present.
Well, that could be it then. Probably, the files you selected did not have PSM evidences supporting the protein.
Felipe can you download her files and run? Maybe the size of the data exceeded some internal philosopher threshold? I cannot think of what else
From: Annie Ha @.> Sent: Friday, October 14, 2022 11:59 AM To: Nesvilab/FragPipe @.> Cc: Nesvizhskii, Alexey @.>; Comment @.> Subject: Re: [Nesvilab/FragPipe] High confident protein did not make to the library (Issue #854)
External Email - Use Caution
I will give it a try. The first trial was with the entire cohort, and then i took a subset of 10 files from the cohort to perform DDA search and the library generation, then I tried again with the subset of 60 files. For both subsets, the protein was present.
— Reply to this email directly, view it on GitHubhttps://github.com/Nesvilab/FragPipe/issues/854#issuecomment-1279187353, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AIIMM66JKKPOPSF4SY4JHEDWDF7NBANCNFSM6AAAAAARDQL5T4. You are receiving this because you commented.Message ID: @.@.>>
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues
hi @anesvi, I believe @klannieha got it solved now by using all files instead of a small sample.
I think what she meant is that with all files, the high confident protein is gone. With a subset, it is there.
Best,
Fengchao
I might have misunderstood this " For both subsets, the protein was present."
Sorry for the confusion! I meant that the results from the analyzes of subsets had the protein, but the results from the entire cohort did not.
I get it now. Possibly, the threshold cut the PSMs out, I'll test on my side.
I took your interact files and run a simple filtering using the sequential option. TGM4 is in the report files:
The issue is that i see the protein in the intermediate proteins, but it's not in the result files.... the psm.tsv, protein.tsv, protein.fas, and the library.tsv. I can upload these files as well if that would help..
What do you mean by intermediate proteins? I see evidences supporting TGM4 in all report files.
sorry i mean intermediate files, bad typo... I am using v18.0 fragpipe with the DIA_speclib pipeline, but this seems to be a consistent problem on my end :\
Maybe i will re-install the latest Fragpipe version and the dependencies and run it again.
Try starting up fresh. Clean your folders, remove temporary files and logs, clean the fragpipe cache and try again. Perhaps you got one setting wrong, it can happen sometimes when we have many settings to check.
Felipe, were you checking with the released version of philosopher or the pre-release of the new one? You could check with the released version
From: Felipe da Veiga Leprevost @.> Sent: Friday, October 14, 2022 2:13 PM To: Nesvilab/FragPipe @.> Cc: Nesvizhskii, Alexey @.>; Mention @.> Subject: Re: [Nesvilab/FragPipe] High confident protein did not make to the library (Issue #854)
External Email - Use Caution
Try starting up fresh. Clean your folders, remove temporary files and logs, clean the fragpipe cache and try again. Perhaps you got one setting wrong, it can happen sometimes when we have many settings to check.
— Reply to this email directly, view it on GitHubhttps://github.com/Nesvilab/FragPipe/issues/854#issuecomment-1279316786, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AIIMM6ZZHXKVY5LCCN67G3TWDGPCRANCNFSM6AAAAAARDQL5T4. You are receiving this because you were mentioned.Message ID: @.@.>>
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues
I tested with v4.4.0
Hi Annie, Any update? Are you able to run it on the whole dataset? Thanks Alexey
From: Annie Ha @.> Sent: Friday, October 14, 2022 2:12 PM To: Nesvilab/FragPipe @.> Cc: Nesvizhskii, Alexey @.>; Mention @.> Subject: Re: [Nesvilab/FragPipe] High confident protein did not make to the library (Issue #854)
External Email - Use Caution
Maybe i will re-install the latest Fragpipe version and the dependencies and run it again.
— Reply to this email directly, view it on GitHubhttps://github.com/Nesvilab/FragPipe/issues/854#issuecomment-1279315819, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AIIMM66VJEDPPTUJ5HUQ5ODWDGO6XANCNFSM6AAAAAARDQL5T4. You are receiving this because you were mentioned.Message ID: @.@.>>
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues
I cleared fragpipe cache and currently still running the whole dataset from the beginning. I will update once the pipeline is finished!
Unfortunately, it seems like the protein is still missing from the library.tsv and other output files. It is still present in the combine.prot.xml, but it seems to be also absent from all the run_peaks.tsv files.
So looks like philosopher filters it out then
Felipe, can you take a look please
From: Annie Ha @.> Sent: Wednesday, October 19, 2022 1:50 PM To: Nesvilab/FragPipe @.> Cc: Nesvizhskii, Alexey @.>; Mention @.> Subject: Re: [Nesvilab/FragPipe] High confident protein did not make to the library (Issue #854)
External Email - Use Caution
Unfortunately, it seems like the protein is still missing from the library.tsv and other output files. It is still present in the combine.prot.xml, but it seems to be also absent from all the run_peaks.tsv files.
— Reply to this email directly, view it on GitHubhttps://github.com/Nesvilab/FragPipe/issues/854#issuecomment-1284370480, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AIIMM66RMO3GATUISYWSPK3WEAYDJANCNFSM6AAAAAARDQL5T4. You are receiving this because you were mentioned.Message ID: @.@.>>
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues
can you send me your log?