mokapot icon indicating copy to clipboard operation
mokapot copied to clipboard

Enzyme Free Digestion results in "0 digested proteins"

Open jonathan-krieger-bruker opened this issue 1 year ago • 4 comments

Hi Will, Thanks for your great tool! I was trying to use mokapot in combination with sage outputs to look at enzyme free peptidomics samples. Unfortunately, when I run this with a simple subset and a FASTA file of Human canonical, no isoforms I always get this error:

(ms2rescore) jrkrieger@JKWORKSTATION:~$ mokapot --proteins /mnt/d/FASTA/Human_Sprot_20220318.fasta --enzyme "." /mnt/d/PyCharm/Fasta_Split/semi_3/results.sage.pin
[INFO] mokapot version 0.10.0
[INFO] Written by William E. Fondrie ([email protected]) in the
[INFO] Department of Genome Sciences at the University of Washington.
[INFO] Command issued:
[INFO] /home/jrkrieger/miniconda3/envs/ms2rescore/bin/mokapot --proteins /mnt/d/FASTA/Human_Sprot_20220318.fasta --enzyme . /mnt/d/PyCharm/Fasta_Split/semi_3/results.sage.pin
[INFO]
[INFO] Starting Analysis
[INFO] =================
[INFO] Parsing PSMs...
[INFO] Reading /mnt/d/PyCharm/Fasta_Split/semi_3/results.sage.pin...
[INFO] Using 26 features:
[INFO]   (1)    retentiontime
[INFO]   (2)    rank
[INFO]   (3)    z=2
[INFO]   (4)    z=3
[INFO]   (5)    z=4
[INFO]   (6)    z=5
[INFO]   (7)    z=6
[INFO]   (8)    z=other
[INFO]   (9)    peptide_len
[INFO]   (10)   missed_cleavages
[INFO]   (11)   isotope_error
[INFO]   (12)   ln(precursor_ppm)
[INFO]   (13)   fragment_ppm
[INFO]   (14)   ln(hyperscore)
[INFO]   (15)   ln(delta_next)
[INFO]   (16)   ln(delta_best)
[INFO]   (17)   aligned_rt
[INFO]   (18)   predicted_rt
[INFO]   (19)   sqrt(delta_rt_model)
[INFO]   (20)   matched_peaks
[INFO]   (21)   longest_b
[INFO]   (22)   longest_y
[INFO]   (23)   longest_y_pct
[INFO]   (24)   ln(matched_intensity_pct)
[INFO]   (25)   scored_candidates
[INFO]   (26)   ln(-poisson)
[INFO] Found 285397 PSMs.
[INFO]   - 146328 target PSMs and 139069 decoy PSMs detected.
[INFO] Protein-level confidence estimates enabled.
[INFO] Parsing FASTA files and digesting proteins...
[INFO]   - Parsed and digested 20377 proteins.
[INFO]   - 20377 had no peptides.
[INFO]   - Retained 0 proteins.
[INFO] Matching target to decoy proteins...
Traceback (most recent call last):
  File "/home/jrkrieger/miniconda3/envs/ms2rescore/bin/mokapot", line 8, in <module>
    sys.exit(main())
  File "/home/jrkrieger/miniconda3/envs/ms2rescore/lib/python3.10/site-packages/mokapot/mokapot.py", line 81, in main
    proteins = read_fasta(
  File "/home/jrkrieger/miniconda3/envs/ms2rescore/lib/python3.10/site-packages/mokapot/parsers/fasta.py", line 130, in read_fasta
    raise ValueError("Only decoy proteins were found in the FASTA file.")
ValueError: Only decoy proteins were found in the FASTA file.

If I change the enzyme to something like trypsin or chymo or elastase etc, everything processes but results in the "Fewer than 90% of all peptides could be matched to proteins. Please verify that your digest settings are correct." which I would expect in this case!

I can add, that if I remove the --protein flag and fasta, everything works as expected.

Any help or suggestions would be much appreciated!

Thanks, Jon

jonathan-krieger-bruker avatar Dec 18 '23 23:12 jonathan-krieger-bruker

Hi @jonathan-krieger-bruker,

Thanks for reaching out and sorry its taken so long to get back to you!

In general, I would not recommend using protein-inference in mokapot for non-enzymatic data. Mokapot uses the picked-protein group method for protein inference under the hood. In the first step of the algorithm, we first digest all of the proteins in their theoretical peptides. We then group any proteins where one proteins theoretical peptides are a proper subset of the other (in the case of non-enzymatic data, every protein would be unique). For each group, we then retain only the unique peptides for the group---and for non-enzymatic data this may very well be non-existant!

Notably, there's also the practical limitation of memory when the database gets too big as well.

All that being said, I would expect at least one protein to survive this process. I'll take a look and see what I can find. Thanks!

wfondrie avatar Jan 23 '24 17:01 wfondrie

Hello, @wfondrie . How do I turn the protein inference off using the command line interface?

prvst avatar Sep 06 '24 03:09 prvst

Hi @prvst 👋 - you can disable protein inference by not providing a FASTA file (the --protein option)

wfondrie avatar Sep 06 '24 21:09 wfondrie

Thanks!

prvst avatar Sep 06 '24 22:09 prvst