Multiple search engine fails to produce proteins 1% FDR
Description of the bug
I have been discussing with @jpfeuffer and @timosachsenberg the following issue:
I have been rerunning quantms and I found an issue that only happens in old datasets, it looks like when I use multiple search engines (doesn't matter which combination), proteomicsLFQ never gets proteins, meaning it always filters all proteins 1% FDR. Then, if I use only one search engine, proteomicsLFQ quantifies proteins nicely. I was talking to @jpfeuffer, and this could be that when the data gets into consensusID in the current version, it may be using q-value to compute q-values, which is wrong. We used in the past id_siwtcher all the way to make sure that PEP is always used.
Even for the new datasets, I think it is wrong even when they don't fail. Then we have to fix this. This is what @julianus mentioned the following:
- [x] I thought we got rid of psm feature extractor?https://github.com/bigbio/quantms/blob/fed997b3148d1efce5a2c7b6e2c1b6b5d2aaf4fb/subworkflows/local/psmrescoring.nf#L37
- [x] And what is PSM clean?
- [x] You are calculating features twice now.
- [x] And please remove ALL occurrences of score switcher. You can specify the score type to filter for in IDFilter now.
- [x] You might need it for luciphor, but ideally the adapter does the score switching internally
I @ypriverol think we need to have a clean and debug this issue. We can:
- [x] Make sure that id_switcher is only used when it is needed (I guess Luciphor only), and if that is the case, use it within the process. Then, no id_switcher is needed as a process.
- [x] Can you make sure that we don't do the feature extractor multiple times?
- [ ] Can you make sure that all the scoring algorithms use the proper scores, PEP or q-value, when needed?
Command used and terminal output
Relevant files
No response
System information
No response
-
Indeed in the latest version of openms, we no longer need this step. Will fix in comming PR
-
PSM clean is to clean error MS2 spectra. See https://github.com/bigbio/quantms/issues/471
-
So will remove extra features module for latest version of openms
-
I checked the consensusID. The input and output of consensusid are both PEP for now. The idscoreswitcher step will be removed for single searche engine in comming PR. How to specify score type for IDFilter in PSM level?
type_peptideparameter?
Hey @daichengxin !
Regarding PSM clean: I think this is called in the "no rescoring" if-case, so I am a bit confused. It is only called for Sage IDs.
Regarding ConsensusID: Hmm.. PEP should be correct. Still, maybe it's the double features? I think it would be nice to trace a spectrum with its IDs all the way back from after Percolator to the raw search engine to see how it changes.
Regarding IDFilter: Yes, we added a new param called score:type_peptide. Choose q-value there.
So is this related to this step I see an error for
Error executing process > 'BIGBIO_QUANTMS:QUANTMS:LFQ:ID:PSMRESCORING:PSMCLEAN (LFQ_Orbitrap_DDA_Condition_A_Sample_Alpha_03)'
Caused by:
At least one value of specified task container settings is invalid
Command executed:
rescoring psm_feature_clean \
--idxml LFQ_Orbitrap_DDA_Condition_A_Sample_Alpha_03_comet_feat.idXML \
--mzml LFQ_Orbitrap_DDA_Condition_A_Sample_Alpha_03.mzML \
--output LFQ_Orbitrap_DDA_Condition_A_Sample_Alpha_03_comet_feat_clean.idXML \
\
2>&1 | tee LFQ_Orbitrap_DDA_Condition_A_Sample_Alpha_03_comet_feat_clean.log
cat <<-END_VERSIONS > versions.yml
"BIGBIO_QUANTMS:QUANTMS:LFQ:ID:PSMRESCORING:PSMCLEAN":
quantms-rescoring: $(rescoring --version 2>&1 | grep -Eo '[0-9].[0-9].[0-9]')
END_VERSIONS
So the files seem to be not copied to the working directory. The mzml file is definitely produced. So it's the comet_feat feature that could not be created?
So should I skip this step (if I can)?
Ah. The files are copied to out in the results folder: results/mzmlindexing/out. Maybe this was changed by mistake?
Can you check when you go inside the work folder of that process execution that both file links are present?
--idxml LFQ_Orbitrap_DDA_Condition_A_Sample_Alpha_03_comet_feat.idXML
--mzml LFQ_Orbitrap_DDA_Condition_A_Sample_Alpha_03.mzML
They are not. But I found results/searchenginecomet/LFQ_Orbitrap_DDA_Condition_A_Sample_Alpha_03_comet.idXML and
results/mzmlindexing/out/LFQ_Orbitrap_DDA_Condition_A_Sample_Alpha_03.mzML
Fixed by #554. Because SageAdapter will output inf value for SAGE:ln(-poisson) score and then Percolator raised this error due to the strange feature value https://github.com/OpenMS/OpenMS/issues/7976. So we added PSM clean step. Looks like @timosachsenberg fixed it. I am not sure if the current openms container contains this fix @jpfeuffer @timosachsenberg. If it's included, we can now remove this step.
Another thing @enryH I haven't seen before it this error:
Caused by:
At least one value of specified task container settings is invalid
it is only fixed on current OpenMS develop branch
I think this issue can be closed.