mokapot
mokapot copied to clipboard
Enzyme Free Digestion results in "0 digested proteins"
Hi Will, Thanks for your great tool! I was trying to use mokapot in combination with sage outputs to look at enzyme free peptidomics samples. Unfortunately, when I run this with a simple subset and a FASTA file of Human canonical, no isoforms I always get this error:
(ms2rescore) jrkrieger@JKWORKSTATION:~$ mokapot --proteins /mnt/d/FASTA/Human_Sprot_20220318.fasta --enzyme "." /mnt/d/PyCharm/Fasta_Split/semi_3/results.sage.pin
[INFO] mokapot version 0.10.0
[INFO] Written by William E. Fondrie ([email protected]) in the
[INFO] Department of Genome Sciences at the University of Washington.
[INFO] Command issued:
[INFO] /home/jrkrieger/miniconda3/envs/ms2rescore/bin/mokapot --proteins /mnt/d/FASTA/Human_Sprot_20220318.fasta --enzyme . /mnt/d/PyCharm/Fasta_Split/semi_3/results.sage.pin
[INFO]
[INFO] Starting Analysis
[INFO] =================
[INFO] Parsing PSMs...
[INFO] Reading /mnt/d/PyCharm/Fasta_Split/semi_3/results.sage.pin...
[INFO] Using 26 features:
[INFO] (1) retentiontime
[INFO] (2) rank
[INFO] (3) z=2
[INFO] (4) z=3
[INFO] (5) z=4
[INFO] (6) z=5
[INFO] (7) z=6
[INFO] (8) z=other
[INFO] (9) peptide_len
[INFO] (10) missed_cleavages
[INFO] (11) isotope_error
[INFO] (12) ln(precursor_ppm)
[INFO] (13) fragment_ppm
[INFO] (14) ln(hyperscore)
[INFO] (15) ln(delta_next)
[INFO] (16) ln(delta_best)
[INFO] (17) aligned_rt
[INFO] (18) predicted_rt
[INFO] (19) sqrt(delta_rt_model)
[INFO] (20) matched_peaks
[INFO] (21) longest_b
[INFO] (22) longest_y
[INFO] (23) longest_y_pct
[INFO] (24) ln(matched_intensity_pct)
[INFO] (25) scored_candidates
[INFO] (26) ln(-poisson)
[INFO] Found 285397 PSMs.
[INFO] - 146328 target PSMs and 139069 decoy PSMs detected.
[INFO] Protein-level confidence estimates enabled.
[INFO] Parsing FASTA files and digesting proteins...
[INFO] - Parsed and digested 20377 proteins.
[INFO] - 20377 had no peptides.
[INFO] - Retained 0 proteins.
[INFO] Matching target to decoy proteins...
Traceback (most recent call last):
File "/home/jrkrieger/miniconda3/envs/ms2rescore/bin/mokapot", line 8, in <module>
sys.exit(main())
File "/home/jrkrieger/miniconda3/envs/ms2rescore/lib/python3.10/site-packages/mokapot/mokapot.py", line 81, in main
proteins = read_fasta(
File "/home/jrkrieger/miniconda3/envs/ms2rescore/lib/python3.10/site-packages/mokapot/parsers/fasta.py", line 130, in read_fasta
raise ValueError("Only decoy proteins were found in the FASTA file.")
ValueError: Only decoy proteins were found in the FASTA file.
If I change the enzyme to something like trypsin or chymo or elastase etc, everything processes but results in the "Fewer than 90% of all peptides could be matched to proteins. Please verify that your digest settings are correct." which I would expect in this case!
I can add, that if I remove the --protein flag and fasta, everything works as expected.
Any help or suggestions would be much appreciated!
Thanks, Jon
Hi @jonathan-krieger-bruker,
Thanks for reaching out and sorry its taken so long to get back to you!
In general, I would not recommend using protein-inference in mokapot for non-enzymatic data. Mokapot uses the picked-protein group method for protein inference under the hood. In the first step of the algorithm, we first digest all of the proteins in their theoretical peptides. We then group any proteins where one proteins theoretical peptides are a proper subset of the other (in the case of non-enzymatic data, every protein would be unique). For each group, we then retain only the unique peptides for the group---and for non-enzymatic data this may very well be non-existant!
Notably, there's also the practical limitation of memory when the database gets too big as well.
All that being said, I would expect at least one protein to survive this process. I'll take a look and see what I can find. Thanks!
Hello, @wfondrie . How do I turn the protein inference off using the command line interface?
Hi @prvst 👋 - you can disable protein inference by not providing a FASTA file (the --protein
option)
Thanks!