mokapot icon indicating copy to clipboard operation
mokapot copied to clipboard

Decoy Prefix results in many unmapped peptides

Open jonathan-krieger-bruker opened this issue 1 year ago • 1 comments

Hi Will, Thanks for your time in advance:

I am trying to use Mokapot on sage results of timsTOF DDA data - nothing special. The issue I encounter is the following:

Mokapot on the pin file without any protein inference - I get expected results. Mokapot on the PIN file with a FASTA file containing no decoys - works fine Mokapot on the PIN file with a FASTA containing decoys - works fine

BUT, when specifying the decoy prepended tag,

$mokapot results.sage.pin -w 30 --proteins /mnt/d/FASTA/Human_Sprot_20220318_decoys.fasta --decoy_prefix Reverse_

I always encounter and error similar to the following:

Traceback (most recent call last):
  File "/home/jrkrieger/miniconda3/envs/Mokapot/bin/mokapot", line 8, in <module>
    sys.exit(main())
  File "/home/jrkrieger/miniconda3/envs/Mokapot/lib/python3.8/site-packages/mokapot/mokapot.py", line 136, in main
    psms, models = brew(
  File "/home/jrkrieger/miniconda3/envs/Mokapot/lib/python3.8/site-packages/mokapot/brew.py", line 183, in brew
    res = [
  File "/home/jrkrieger/miniconda3/envs/Mokapot/lib/python3.8/site-packages/mokapot/brew.py", line 184, in <listcomp>
    p.assign_confidence(s, eval_fdr=test_fdr, desc=d)
  File "/home/jrkrieger/miniconda3/envs/Mokapot/lib/python3.8/site-packages/mokapot/dataset.py", line 586, in assign_confidence
    return LinearConfidence(self, scores, eval_fdr=eval_fdr, desc=desc)
  File "/home/jrkrieger/miniconda3/envs/Mokapot/lib/python3.8/site-packages/mokapot/confidence.py", line 367, in __init__
    self._assign_confidence(desc=desc)
  File "/home/jrkrieger/miniconda3/envs/Mokapot/lib/python3.8/site-packages/mokapot/confidence.py", line 418, in _assign_confidence
    proteins = picked_protein(
  File "/home/jrkrieger/miniconda3/envs/Mokapot/lib/python3.8/site-packages/mokapot/picked_protein.py", line 99, in picked_protein
    raise ValueError(
ValueError: Fewer than 90% of all peptides could be matched to proteins. Please verify that your digest settings are correct.

changing the decoy prefix in the command to something nonsensical:

$mokapot results.sage.pin -w 30 --proteins /mnt/d/FASTA/Human_Sprot_20220318_decoys.fasta --decoy_prefix blahblah

gives the same results as if not including the --decoy_prefix flag (which makes sense).

Any suggestions as to what might be going on here would be much appreciated. Thanks, Jon

jonathan-krieger-bruker avatar Jun 19 '23 03:06 jonathan-krieger-bruker

Hi @jonathan-krieger-bruker 👋

Sorry for the slow response - can you elaborate more on how the decoy sequences in your FASTA file were generated?

Also, a small example from the file would be helpful.

wfondrie avatar Sep 11 '23 17:09 wfondrie