SAGE error in Big dataset TMT
Description of the bug
ERROR ~ Error executing process > 'NFCORE_QUANTMS:QUANTMS:TMT:ID:PSMRESCORING:PERCOLATOR (g00594_Prot_37_11)'
Caused by:
Process `NFCORE_QUANTMS:QUANTMS:TMT:ID:PSMRESCORING:PERCOLATOR (g00594_Prot_37_11)` terminated with an error exit status (9)
Command executed:
OMP_NUM_THREADS=48 PercolatorAdapter \
-in g00594_Prot_37_11_sage.idXML \
-out g00594_Prot_37_11_sage_perc.idXML \
-threads 48 \
-subset_max_train 300000 \
-decoy_pattern DECOY_ \
-post_processing_tdc \
-score_type pep \
-debug 0 \
2>&1 | tee g00594_Prot_37_11_sage_percolator.log
cat <<-END_VERSIONS > versions.yml
"NFCORE_QUANTMS:QUANTMS:TMT:ID:PSMRESCORING:PERCOLATOR":
PercolatorAdapter: $(PercolatorAdapter 2>&1 | grep -E '^Version(.*)' | sed 's/Version: //g' | cut -d ' ' -f 1)
percolator: $(percolator -h 2>&1 | grep -E '^Percolator version(.*)' | sed 's/Percolator version //g')
END_VERSIONS
Command exit status:
9
Command output:
Loading input file: g00594_Prot_37_11_sage.idXML
Merging peptide ids.
Merging protein ids.
Prepared percolator input.
Standard output: Running: /usr/local/bin/percolator -U -m /tmp/20231209_150541_hl-codon-bm-10.ebi.ac.uk_42_1/20231209_150541_hl-codon-bm-10.ebi.ac.uk_42_2_target_pout_psms.tab -M /tmp/20231209_150541_hl-codon-bm-10.ebi.ac.uk_42_1/20231209_150541_hl-codon-bm-10.ebi.ac.uk_42_2_decoy_pout_psms.tab --num-threads 48 -N 300000 -Y /tmp/20231209_150541_hl-codon-bm-10.ebi.ac.uk_42_1/20231209_150541_hl-codon-bm-10.ebi.ac.uk_42_2_pin.tab
Standard error: Percolator version 3.05.0, Build Date Aug 31 2020 19:03:04
Copyright (c) 2006-9 University of Washington. All rights reserved.
Written by Lukas Käll ([email protected]) in the
Department of Genome Sciences at the University of Washington.
Issued command:
/usr/local/bin/percolator -U -m /tmp/20231209_150541_hl-codon-bm-10.ebi.ac.uk_42_1/20231209_150541_hl-codon-bm-10.ebi.ac.uk_42_2_target_pout_psms.tab -M /tmp/20231209_150541_hl-codon-bm-10.ebi.ac.uk_42_1/20231209_150541_hl-codon-bm-10.ebi.ac.uk_42_2_decoy_pout_psms.tab --num-threads 48 -N 300000 -Y /tmp/20231209_150541_hl-codon-bm-10.ebi.ac.uk_42_1/20231209_150541_hl-codon-bm-10.ebi.ac.uk_42_2_pin.tab
Started Sat Dec 9 15:05:43 2023
Hyperparameters: selectionFdr=0.01, Cpos=0, Cneg=0, maxNiter=10
Reading tab-delimited input from datafile /tmp/20231209_150541_hl-codon-bm-10.ebi.ac.uk_42_1/20231209_150541_hl-codon-bm-10.ebi.ac.uk_42_2_pin.tab
Features:
mass peplen charge2 charge3 charge4 charge5 enzN enzC enzInt dm absdm score SAGE:ln(-poisson) SAGE:ln(delta_best) SAGE:ln(delta_next) SAGE:ln(matched_intensity_pct) SAGE:longest_b SAGE:longest_y SAGE:longest_y_pct SAGE:matched_peaks SAGE:scored_candidates
Found 33916 PSMs
Concatenated search input detected and --post-processing-tdc flag set. Applying target-decoy competition on Percolator scores.
Train/test set contains 16926 positives and 16990 negatives, size ratio=0.996233 and pi0=1
Selecting Cpos by cross-validation.
Selecting Cneg by cross-validation.
Split 1: Exception caught: Error in the input data: cannot find an initial direction with positive training examples. Consider setting/raising the initial training FDR threshold (--train-initial-fdr).
Terminating.
Process '/usr/local/bin/percolator' did not finish successfully (exit code: ). Please check the log.
PercolatorAdapter took 2.54 s (wall), 2.28 s (CPU), 0.08 s (system), 2.20 s (user); Peak Memory Usage: 168 MB.
Command wrapper:
Loading input file: g00594_Prot_37_11_sage.idXML
Merging peptide ids.
Merging protein ids.
Prepared percolator input.
Standard output: Running: /usr/local/bin/percolator -U -m /tmp/20231209_150541_hl-codon-bm-10.ebi.ac.uk_42_1/20231209_150541_hl-codon-bm-10.ebi.ac.uk_42_2_target_pout_psms.tab -M /tmp/20231209_150541_hl-codon-bm-10.ebi.ac.uk_42_1/20231209_150541_hl-codon-bm-10.ebi.ac.uk_42_2_decoy_pout_psms.tab --num-threads 48 -N 300000 -Y /tmp/20231209_150541_hl-codon-bm-10.ebi.ac.uk_42_1/20231209_150541_hl-codon-bm-10.ebi.ac.uk_42_2_pin.tab
Standard error: Percolator version 3.05.0, Build Date Aug 31 2020 19:03:04
Copyright (c) 2006-9 University of Washington. All rights reserved.
Written by Lukas Käll ([email protected]) in the
Department of Genome Sciences at the University of Washington.
Issued command:
/usr/local/bin/percolator -U -m /tmp/20231209_150541_hl-codon-bm-10.ebi.ac.uk_42_1/20231209_150541_hl-codon-bm-10.ebi.ac.uk_42_2_target_pout_psms.tab -M /tmp/20231209_150541_hl-codon-bm-10.ebi.ac.uk_42_1/20231209_150541_hl-codon-bm-10.ebi.ac.uk_42_2_decoy_pout_psms.tab --num-threads 48 -N 300000 -Y /tmp/20231209_150541_hl-codon-bm-10.ebi.ac.uk_42_1/20231209_150541_hl-codon-bm-10.ebi.ac.uk_42_2_pin.tab
Started Sat Dec 9 15:05:43 2023
Hyperparameters: selectionFdr=0.01, Cpos=0, Cneg=0, maxNiter=10
Reading tab-delimited input from datafile /tmp/20231209_150541_hl-codon-bm-10.ebi.ac.uk_42_1/20231209_150541_hl-codon-bm-10.ebi.ac.uk_42_2_pin.tab
Features:
mass peplen charge2 charge3 charge4 charge5 enzN enzC enzInt dm absdm score SAGE:ln(-poisson) SAGE:ln(delta_best) SAGE:ln(delta_next) SAGE:ln(matched_intensity_pct) SAGE:longest_b SAGE:longest_y SAGE:longest_y_pct SAGE:matched_peaks SAGE:scored_candidates
Found 33916 PSMs
Concatenated search input detected and --post-processing-tdc flag set. Applying target-decoy competition on Percolator scores.
Train/test set contains 16926 positives and 16990 negatives, size ratio=0.996233 and pi0=1
Selecting Cpos by cross-validation.
Selecting Cneg by cross-validation.
Split 1: Exception caught: Error in the input data: cannot find an initial direction with positive training examples. Consider setting/raising the initial training FDR threshold (--train-initial-fdr).
Terminating.
Process '/usr/local/bin/percolator' did not finish successfully (exit code: ). Please check the log.
PercolatorAdapter took 2.54 s (wall), 2.28 s (CPU), 0.08 s (system), 2.20 s (user); Peak Memory Usage: 168 MB.
Work dir:
/hps/nobackup/juan/pride/reanalysis/absolute-expression/cell-lines/MSV000085836/work/9a/8f8f857d333f8a6b2224af6bac7059
Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`
Command used and terminal output
No response
Relevant files
No response
System information
No response
That's mostly a Percolator error though. Is is Sage-only or multiple engines with ConsensusID? Did you try the fix suggested in the error? As a last resort I would plot the score distributions and check what is wrong. Lastly someone could try to implement allowing to use the q-values directly out of Sage. It also has an LDA for rescoring.
I know that @timosachsenberg has this error a lot in cross linking experiments
That's mostly a Percolator error though. Is is Sage-only or multiple engines with ConsensusID? Did you try the fix suggested in the error?
Im running the experiment now without SAGE. It is a multiple search engine run.
As a last resort I would plot the score distributions and check what is wrong. Lastly someone could try to implement allowing to use the q-values directly out of Sage. It also has an LDA for rescoring.
Or sage only to see if sage is the reason or the combination
The interesting thing, is that the issue is within sage, not with other search engines.
Could also be another engine that is just very different from sage such that the combination does not work well. (That's why I would try it alone). Also combinations kind of defeat the speed advantage of sage.
Ah I see what you mean now. Percolator fails on the sage-only output before consensusID. Yes that is a problem.
Yes, it is inside SAGE adapter. REally nice issue for @timosachsenberg Christmas.
Maybe it is, maybe not. We mostly just pass whatever was in the pin file.
I would love to skip all the idxml conversion back and forth but we currently depend on it because we a) depend on the information about search settings and b) we had problems with the scan_ids in the pin file (they were sometimes just a number, so a lookup in the mzml was necessary)
The interesting thing, is that the issue is within sage, not with other search engines.
This might indicate that sage is not finding enough true targets. Maybe some configuration wrong? How many hits are left if the output of the other search engines are filtered by q-value? Many or just a few hundreds (this could mean that these are a bit more sensitive and just had enough true targets to find a score and direction)
I need the other search engines to finish, I will let you know for that particular file what happen.