FragPipe icon indicating copy to clipboard operation
FragPipe copied to clipboard

Philosopher writes incorrect protein info to protein.tsv

Open jd690764 opened this issue 1 year ago • 10 comments

  • Describe the issue or question: Hi,

I am importing a dataset into skyline using the pep.xml files following the tutorial and I don't understand why peptides that are not in the combined_modified_peptides.tsv file are imported into the spectral library file in skyline.

During import, I use probability cutoff = 0.95 (the highest prob value for a run in this dataset is 0.9513 for a 1% peptide fdr and all the other ones are much lower - I attached the log file). Out of ~27K peptides in the spectral library ~25K are in the combined_mod_pept file. Some of the extra peptides are coming from peptides assigned to decoy proteins (~700) and a handful others are peptides with 0.95 < prob < 0.9513, but after this I still have >1000 peptides that I am not sure how to explain.

A related question: would it be possible to output a filtered version of the pep.xml files that is consistent with the other result files?

Thank you, Janos

  • Upload you log_2022-08-05_01-05-53.txt r log file (If a log file hasn't been generated, go to the 'Run' tab in FragPipe, click 'Export Log', zip the resulting "log_[date_time].txt" file to avoid truncation, then attach the zipped file by drag & drop here.)

jd690764 avatar Aug 13 '22 02:08 jd690764

Hi Janos,

Your log file shows that the job crashed due to an error. Can you double check if you sent the correct log file?

Best,

Fengchao

fcyu avatar Aug 13 '22 02:08 fcyu

Oops, sorry about that - here is the correct one. Janos log_2022-08-10_20-00-02.txt s

jd690764 avatar Aug 13 '22 20:08 jd690764

Thanks for your log file.

First of all, please upgrade Philosopher to the latest version in case it is due to a bug that has been fixed.

Second, IonQuant complained that your proteins have inconsistency:

2022-08-10 19:59:48 [WARNING] - Protein NP_001019845 has two inconsistent entries in protein.tsv files: NP_001019845,NP_001019845,,125,Homo sapiens,,protein lin-52 homolog,1.000000,0.998500, vs NP_001019845,NP_001019845,,112,Homo sapiens,,protein lin-52 homolog,1.000000,0.999000,NP_001358934.
2022-08-10 19:59:48 [WARNING] - The protein NP_001019845 has different lengths (125 vs 112) in different protein.tsv files
2022-08-10 19:59:48 [WARNING] - Protein NP_001092264 has two inconsistent entries in protein.tsv files: NP_001092264,NP_001092264,,375,Homo sapiens,,FHF complex subunit HOOK interacting protein 1B isoform 2,1.000000,0.999000, vs NP_001092264,NP_001092264,,972,Homo sapiens,,FHF complex subunit HOOK interacting protein 1B isoform 2,1.000000,0.999000,NP_115503.
2022-08-10 19:59:48 [WARNING] - The protein NP_001092264 has different lengths (375 vs 972) in different protein.tsv files
2022-08-10 19:59:48 [WARNING] - Protein NP_001284602 has two inconsistent entries in protein.tsv files: NP_001284602,NP_001284602,,237,Homo sapiens,,regulation of nuclear pre-mRNA domain-containing protein 2 isoform,1.000000,0.999000, vs NP_001284602,NP_001284602,,934,Homo sapiens,,regulation of nuclear pre-mRNA domain-containing protein 2 isoform,1.000000,0.999000,NP_001284603, NP_001374049, NP_001374051, NP_001374052, NP_056018.
2022-08-10 19:59:48 [WARNING] - The protein NP_001284602 has different lengths (237 vs 934) in different protein.tsv files
2022-08-10 19:59:48 [WARNING] - Protein NP_001019845 has two inconsistent entries in protein.tsv files: NP_001019845,NP_001019845,,125,Homo sapiens,,protein lin-52 homolog,1.000000,0.998500, vs NP_001019845,NP_001019845,,112,Homo sapiens,,protein lin-52 homolog,1.000000,0.999000,NP_001358934.
2022-08-10 19:59:48 [WARNING] - The protein NP_001019845 has different lengths (125 vs 112) in different protein.tsv files
2022-08-10 19:59:48 [WARNING] - Protein NP_001092876 has two inconsistent entries in protein.tsv files: NP_001092876,NP_001092876,,433,Homo sapiens,,zinc transporter ZIP6 isoform 2,1.000000,0.999000,NP_036451 vs NP_001092876,NP_001092876,,375,Homo sapiens,,zinc transporter ZIP6 isoform 2,1.000000,0.999000,.
2022-08-10 19:59:48 [WARNING] - The protein NP_001092876 has different lengths (433 vs 375) in different protein.tsv files
2022-08-10 19:59:48 [WARNING] - Protein NP_001092264 has two inconsistent entries in protein.tsv files: NP_001092264,NP_001092264,,375,Homo sapiens,,FHF complex subunit HOOK interacting protein 1B isoform 2,1.000000,0.999000, vs NP_001092264,NP_001092264,,972,Homo sapiens,,FHF complex subunit HOOK interacting protein 1B isoform 2,1.000000,0.999000,NP_115503.
2022-08-10 19:59:48 [WARNING] - The protein NP_001092264 has different lengths (375 vs 972) in different protein.tsv files
2022-08-10 19:59:48 [WARNING] - Protein NP_001284602 has two inconsistent entries in protein.tsv files: NP_001284602,NP_001284602,,237,Homo sapiens,,regulation of nuclear pre-mRNA domain-containing protein 2 isoform,1.000000,0.999000, vs NP_001284602,NP_001284602,,934,Homo sapiens,,regulation of nuclear pre-mRNA domain-containing protein 2 isoform,1.000000,0.999000,NP_001284603, NP_001374049, NP_001374051, NP_001374052, NP_056018.
2022-08-10 19:59:48 [WARNING] - The protein NP_001284602 has different lengths (237 vs 934) in different protein.tsv files
2022-08-10 19:59:49 [WARNING] - Protein NP_001012768 has two inconsistent entries in protein.tsv files: NP_001012768,NP_001012768,,481,Homo sapiens,,abl interactor 1 isoform b,0.999900,0.999000,NP_001012769, NP_001012770, NP_001334959 vs NP_001012768,NP_001012768,,135,Homo sapiens,,abl interactor 1 isoform b,1.000000,0.999000,.
2022-08-10 19:59:49 [WARNING] - The protein NP_001012768 has different lengths (481 vs 135) in different protein.tsv files
2022-08-10 19:59:49 [WARNING] - Protein NP_001037850 has two inconsistent entries in protein.tsv files: NP_001037850,NP_001037850,,408,Homo sapiens,,transmembrane protein 237 isoform a,1.000000,0.999000,NP_689601 vs NP_001037850,NP_001037850,,1212,Homo sapiens,,transmembrane protein 237 isoform a,1.000000,0.998600,NP_001243390.
2022-08-10 19:59:49 [WARNING] - The protein NP_001037850 has different lengths (408 vs 1212) in different protein.tsv files

Did you re-use the old files from the previous run? Does your fasta file have proteins with the same ID but different sequence? If yes, please correct the settings and re-run the whole task from scratch.

Best,

Fengchao

fcyu avatar Aug 13 '22 21:08 fcyu

Hi Fengchao,

I use an older version of philosopher, because: https://github.com/Nesvilab/FragPipe/issues/787.

I saw those messages, but they don't make sense to me - I have been using the same library. I'll re-run it one more time.

Thanks, Janos

jd690764 avatar Aug 15 '22 19:08 jd690764

Hi Fengchao,

I re-ran the analysis and I got the same message about some of the proteins having more than one length. I spot -checked the library, that's not correct, so I don't understand why ionquant reports that message. I attached the new log file.

Thanks, Janos log_2022-08-16_10-45-39.txt

jd690764 avatar Aug 16 '22 18:08 jd690764

Hi Janos,

Do you mean that those warnings are not correct? Can you send me all of your protein.tsv files?

Thanks,

Fengchao

fcyu avatar Aug 16 '22 18:08 fcyu

What I meant was that in the fasta library there is only one entry for the protein I checked. The protein.tsv files are mixed, 2 out 18 are incorrect, reporting an incorrect protein length.

Janos

jd690764 avatar Aug 16 '22 18:08 jd690764

Hi Janos,

Thanks for your clarification. It sounds like a bug in Philosopher.

Hi Felipe @prvst , can you take a look? Basically, Philosopher wrote incorrect entries to the protein.tsv. For example

Protein NP_001014796 has two inconsistent entries in protein.tsv files: 
NP_001014796,NP_001014796,,119,Homo sapiens,,discoidin domain-containing receptor 2 precursor,1.000000,0.998500,NP_001139699
NP_001014796,NP_001014796,,855,Homo sapiens,,discoidin domain-containing receptor 2 precursor,0.997400,0.998600,.

Thanks,

Fengchao

fcyu avatar Aug 16 '22 18:08 fcyu

What I meant was that in the fasta library there is only one entry for the protein I checked. The protein.tsv files are mixed, 2 out 18 are incorrect, reporting an incorrect protein length.

Janos

Sorry I don't understand what you're describing, could you elaborate more?

prvst avatar Aug 16 '22 19:08 prvst

@jd690764 any update on this issue? Do you still have the problem with philosopher output ?

anesvi avatar Sep 06 '22 20:09 anesvi