FragPipe icon indicating copy to clipboard operation
FragPipe copied to clipboard

High confident protein did not make to the library

Open klannieha opened this issue 1 year ago • 31 comments

Hi,

I am using the DIA_SpecLib pipeline on Fragpipe to generate a library from 150 DDA files. Previously, using 10 files from the DDA files using both the DDA_LFQ and the library generation pipelines showed that one of the high abundant protein (TGM4) is present in all 10 files, and that there is at least 50 peptides mapped to this protein. However, when I generated the library using the entire cohort, TGM4 is not included in the output files, with all the peptides from TGM4 missing. From searching through the intermediate files, this protein is present in the combine.prot.xml file, with probability of 1.00.

The log file has been attached, alongside with the subset of combine.prot.xml file where TGM4 was recorded. I have checked the protein.fas, psm.tsv, ion.tsv, peptide.tsv, protein.tsv files, and none of them have TGM4 in there. Please take a look and let me know if it's something with my library parameters or if anything else is going on.

Thank you! Annie

log_2022-10-11_21-09-52.txt tgm4.txt

klannieha avatar Oct 12 '22 17:10 klannieha

That is very strange … Do you see any of the peptides from that protein in PSM.tsv?

From: Annie Ha @.> Sent: Wednesday, October 12, 2022 1:44 PM To: Nesvilab/FragPipe @.> Cc: Subscribed @.***> Subject: [Nesvilab/FragPipe] High confident protein did not make to the library (Issue #854)

External Email - Use Caution

Hi,

I am using the DIA_SpecLib pipeline on Fragpipe to generate a library from 150 DDA files. Previously, using 10 files from the DDA files using both the DDA_LFQ and the library generation pipelines showed that one of the high abundant protein (TGM4) is present in all 10 files, and that there is at least 50 peptides mapped to this protein. However, when I generated the library using the entire cohort, TGM4 is not included in the output files, with all the peptides from TGM4 missing. From searching through the intermediate files, this protein is present in the combine.prot.xml file, with probability of 1.00.

The log file has been attached, alongside with the subset of combine.prot.xml file where TGM4 was recorded. I have checked the protein.fas, psm.tsv, ion.tsv, peptide.tsv, protein.tsv files, and none of them have TGM4 in there. Please take a look and let me know if it's something with my library parameters or if anything else is going on.

Thank you! Annie

log_2022-10-11_21-09-52.txthttps://github.com/Nesvilab/FragPipe/files/9767419/log_2022-10-11_21-09-52.txt tgm4.txthttps://github.com/Nesvilab/FragPipe/files/9767452/tgm4.txt

— Reply to this email directly, view it on GitHubhttps://github.com/Nesvilab/FragPipe/issues/854, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AIIMM66S3QQ22MAD6LKFPRDWC32GPANCNFSM6AAAAAARDQL5T4. You are receiving this because you are subscribed to this thread.Message ID: @.@.>>


Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues

anesvi avatar Oct 12 '22 17:10 anesvi

None of the peptides from that protein were present in the psm.tsv. It almost looked like this protein was just completely filtered out...

klannieha avatar Oct 12 '22 17:10 klannieha

Since this is Philosopher related, we will need Felipe’s help to investigate

Perhaps you can share the pep.xml files and the sequence database with us so he can investigate. We will keep confidential of course

Thanks Alexey

From: Annie Ha @.> Sent: Wednesday, October 12, 2022 1:55 PM To: Nesvilab/FragPipe @.> Cc: Nesvizhskii, Alexey @.>; Comment @.> Subject: Re: [Nesvilab/FragPipe] High confident protein did not make to the library (Issue #854)

External Email - Use Caution

None of the peptides from that protein were present in the psm.tsv. It almost looked like this protein was just completely filtered out...

— Reply to this email directly, view it on GitHubhttps://github.com/Nesvilab/FragPipe/issues/854#issuecomment-1276538566, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AIIMM66SOXKM6CQ335J5WZ3WC33Q3ANCNFSM6AAAAAARDQL5T4. You are receiving this because you commented.Message ID: @.@.>>


Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues

anesvi avatar Oct 12 '22 18:10 anesvi

Hi @klannieha. Please send me your interact*.pep.xml, combined.prot.xml, and your database files.

prvst avatar Oct 12 '22 20:10 prvst

I have sent the data to your email with a google drive link!

klannieha avatar Oct 14 '22 00:10 klannieha

Can you explain to me how do you generate your decoys?

prvst avatar Oct 14 '22 13:10 prvst

The decoys was generated using the TPP tool decoyFastaGenerator.pl . The same fasta file was used to analyze the DDA files, and that the high abundant proteins were there. I also just re-ran the pipeline with 60 files from the cohort, and that TGM4 appeared in the results.

klannieha avatar Oct 14 '22 15:10 klannieha

I suggest you use our database generation method, if possible. So, the first trial you made was with a subset of the files?

prvst avatar Oct 14 '22 15:10 prvst

I will give it a try. The first trial was with the entire cohort, and then i took a subset of 10 files from the cohort to perform DDA search and the library generation, then I tried again with the subset of 60 files. For both subsets, the protein was present.

klannieha avatar Oct 14 '22 15:10 klannieha

Well, that could be it then. Probably, the files you selected did not have PSM evidences supporting the protein.

prvst avatar Oct 14 '22 16:10 prvst

Felipe can you download her files and run? Maybe the size of the data exceeded some internal philosopher threshold? I cannot think of what else

From: Annie Ha @.> Sent: Friday, October 14, 2022 11:59 AM To: Nesvilab/FragPipe @.> Cc: Nesvizhskii, Alexey @.>; Comment @.> Subject: Re: [Nesvilab/FragPipe] High confident protein did not make to the library (Issue #854)

External Email - Use Caution

I will give it a try. The first trial was with the entire cohort, and then i took a subset of 10 files from the cohort to perform DDA search and the library generation, then I tried again with the subset of 60 files. For both subsets, the protein was present.

— Reply to this email directly, view it on GitHubhttps://github.com/Nesvilab/FragPipe/issues/854#issuecomment-1279187353, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AIIMM66JKKPOPSF4SY4JHEDWDF7NBANCNFSM6AAAAAARDQL5T4. You are receiving this because you commented.Message ID: @.@.>>


Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues

anesvi avatar Oct 14 '22 16:10 anesvi

hi @anesvi, I believe @klannieha got it solved now by using all files instead of a small sample.

prvst avatar Oct 14 '22 16:10 prvst

I think what she meant is that with all files, the high confident protein is gone. With a subset, it is there.

Best,

Fengchao

fcyu avatar Oct 14 '22 16:10 fcyu

I might have misunderstood this " For both subsets, the protein was present."

prvst avatar Oct 14 '22 16:10 prvst

Sorry for the confusion! I meant that the results from the analyzes of subsets had the protein, but the results from the entire cohort did not.

klannieha avatar Oct 14 '22 16:10 klannieha

I get it now. Possibly, the threshold cut the PSMs out, I'll test on my side.

prvst avatar Oct 14 '22 16:10 prvst

I took your interact files and run a simple filtering using the sequential option. TGM4 is in the report files:

image

prvst avatar Oct 14 '22 18:10 prvst

The issue is that i see the protein in the intermediate proteins, but it's not in the result files.... the psm.tsv, protein.tsv, protein.fas, and the library.tsv. I can upload these files as well if that would help..

klannieha avatar Oct 14 '22 18:10 klannieha

What do you mean by intermediate proteins? I see evidences supporting TGM4 in all report files.

prvst avatar Oct 14 '22 18:10 prvst

sorry i mean intermediate files, bad typo... I am using v18.0 fragpipe with the DIA_speclib pipeline, but this seems to be a consistent problem on my end :\

klannieha avatar Oct 14 '22 18:10 klannieha

Maybe i will re-install the latest Fragpipe version and the dependencies and run it again.

klannieha avatar Oct 14 '22 18:10 klannieha

Try starting up fresh. Clean your folders, remove temporary files and logs, clean the fragpipe cache and try again. Perhaps you got one setting wrong, it can happen sometimes when we have many settings to check.

prvst avatar Oct 14 '22 18:10 prvst

Felipe, were you checking with the released version of philosopher or the pre-release of the new one? You could check with the released version

From: Felipe da Veiga Leprevost @.> Sent: Friday, October 14, 2022 2:13 PM To: Nesvilab/FragPipe @.> Cc: Nesvizhskii, Alexey @.>; Mention @.> Subject: Re: [Nesvilab/FragPipe] High confident protein did not make to the library (Issue #854)

External Email - Use Caution

Try starting up fresh. Clean your folders, remove temporary files and logs, clean the fragpipe cache and try again. Perhaps you got one setting wrong, it can happen sometimes when we have many settings to check.

— Reply to this email directly, view it on GitHubhttps://github.com/Nesvilab/FragPipe/issues/854#issuecomment-1279316786, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AIIMM6ZZHXKVY5LCCN67G3TWDGPCRANCNFSM6AAAAAARDQL5T4. You are receiving this because you were mentioned.Message ID: @.@.>>


Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues

anesvi avatar Oct 14 '22 18:10 anesvi

I tested with v4.4.0

prvst avatar Oct 14 '22 18:10 prvst

Hi Annie, Any update? Are you able to run it on the whole dataset? Thanks Alexey

From: Annie Ha @.> Sent: Friday, October 14, 2022 2:12 PM To: Nesvilab/FragPipe @.> Cc: Nesvizhskii, Alexey @.>; Mention @.> Subject: Re: [Nesvilab/FragPipe] High confident protein did not make to the library (Issue #854)

External Email - Use Caution

Maybe i will re-install the latest Fragpipe version and the dependencies and run it again.

— Reply to this email directly, view it on GitHubhttps://github.com/Nesvilab/FragPipe/issues/854#issuecomment-1279315819, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AIIMM66VJEDPPTUJ5HUQ5ODWDGO6XANCNFSM6AAAAAARDQL5T4. You are receiving this because you were mentioned.Message ID: @.@.>>


Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues

anesvi avatar Oct 18 '22 21:10 anesvi

I cleared fragpipe cache and currently still running the whole dataset from the beginning. I will update once the pipeline is finished!

klannieha avatar Oct 18 '22 21:10 klannieha

Unfortunately, it seems like the protein is still missing from the library.tsv and other output files. It is still present in the combine.prot.xml, but it seems to be also absent from all the run_peaks.tsv files.

klannieha avatar Oct 19 '22 17:10 klannieha

So looks like philosopher filters it out then

Felipe, can you take a look please

From: Annie Ha @.> Sent: Wednesday, October 19, 2022 1:50 PM To: Nesvilab/FragPipe @.> Cc: Nesvizhskii, Alexey @.>; Mention @.> Subject: Re: [Nesvilab/FragPipe] High confident protein did not make to the library (Issue #854)

External Email - Use Caution

Unfortunately, it seems like the protein is still missing from the library.tsv and other output files. It is still present in the combine.prot.xml, but it seems to be also absent from all the run_peaks.tsv files.

— Reply to this email directly, view it on GitHubhttps://github.com/Nesvilab/FragPipe/issues/854#issuecomment-1284370480, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AIIMM66RMO3GATUISYWSPK3WEAYDJANCNFSM6AAAAAARDQL5T4. You are receiving this because you were mentioned.Message ID: @.@.>>


Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues

anesvi avatar Oct 19 '22 17:10 anesvi

can you send me your log?

prvst avatar Oct 19 '22 17:10 prvst