the charge min and max and missclevages are sometimes not working
Description of the bug
@jpfeuffer @timosachsenberg @daichengxin I found one dataset that we search using msgf, here the command:
#!/bin/bash -euo pipefail
MSGFPlusAdapter \
-protocol automatic \
-in 01086_C01_P010738_S00_N03_R1.mzML \
-out 01086_C01_P010738_S00_N03_R1_msgf.idXML \
-executable $(find /usr/local/share/msgf_plus-*/MSGFPlus.jar -maxdepth 0) \
-threads 6 \
-java_memory 30720 \
-database "GRCh38r110_GCA97s_coding_proteins_19Jul23-decoy.fa" \
-instrument high_res \
-matches_per_spec 1 \
-min_precursor_charge 2 \
-max_precursor_charge 4 \
-min_peptide_length 6 \
-max_peptide_length 40 \
-max_missed_cleavages 2 \
-isotope_error_range 0,1 \
-enzyme "Trypsin/P" \
-tryptic fully \
-precursor_mass_tolerance 40.0 \
-precursor_error_units ppm \
-fixed_modifications 'Carbamidomethyl (C)' \
-variable_modifications 'Acetyl (Protein N-term)' 'Deamidated (N)' 'Deamidated (Q)' 'Oxidation (M)' \
-max_mods 3 \
-PeptideIndexing:IL_equivalent \
-PeptideIndexing:unmatched_action warn \
-debug 0 \
\
2>&1 | tee 01086_C01_P010738_S00_N03_R1_msgf.log
However in the file output I found the following id:
<PeptideIdentification score_type="SpecEValue" higher_score_better="false" significance_threshold="0.0" MZ="664.68194580078125" RT="33
78.397500000000036" spectrum_reference="controllerType=0 controllerNumber=1 scan=24975" >
<PeptideHit score="1.4043417e-21" sequence="INNAHTIGC(Carbamidomethyl)NAVSWAPAVVPGSLIDHPSGQKPNYIKR" charge="6" aa_before="K K
K K K K K K K K K K K K K" aa_after="F F F F F F F F F F F F F F F" start="130 147 144 130 190 130 147 144 130 190 130 147 144 130 190" end="166 183 1
80 166 226 166 183 180 166 226 166 183 180 166 226" protein_refs="PH_14293 PH_14294 PH_14295 PH_14296 PH_14297 PH_44721 PH_44722 PH_44723 PH_44724 PH_
44725 PH_112619 PH_112620 PH_112621 PH_112622 PH_112623" >
<UserParam type="float" name="MS:1002049" value="103.0"/>
<UserParam type="float" name="MS:1002050" value="165.0"/>
<UserParam type="float" name="MS:1002052" value="1.4043417e-21"/>
<UserParam type="float" name="MS:1002053" value="6.614773000000001e-14"/>
<UserParam type="string" name="AssumedDissociationMethod" value="HCD"/>
<UserParam type="string" name="CTermIonCurrentRatio" value="0.3437819"/>
<UserParam type="string" name="ExplainedIonCurrentRatio" value="0.39947474"/>
<UserParam type="string" name="MS2IonCurrent" value="2429519.8"/>
<UserParam type="string" name="MeanErrorAll" value="4.888304"/>
<UserParam type="string" name="MeanErrorTop7" value="2.5796666"/>
<UserParam type="string" name="MeanRelErrorAll" value="-0.8928608"/>
<UserParam type="string" name="MeanRelErrorTop7" value="2.5497687"/>
<UserParam type="string" name="NTermIonCurrentRatio" value="0.055692848"/>
<UserParam type="string" name="NumMatchedMainIons" value="23"/>
<UserParam type="string" name="StdevErrorAll" value="4.698519"/>
<UserParam type="string" name="StdevErrorTop7" value="1.8443376"/>
<UserParam type="string" name="StdevRelErrorAll" value="6.7211905"/>
<UserParam type="string" name="StdevRelErrorTop7" value="1.885455"/>
<UserParam type="float" name="calcMZ" value="664.51446533203125"/>
<UserParam type="int" name="pass_threshold" value="1"/>
<UserParam type="int" name="start" value="191"/>
<UserParam type="int" name="end" value="227"/>
<UserParam type="string" name="target_decoy" value="target"/>
<UserParam type="string" name="isotope_error" value="1"/>
<UserParam type="string" name="protein_references" value="non-unique"/>
</PeptideHit>
<UserParam type="string" name="MS:1001115" value="24975"/>
</PeptideIdentification>
What could be the problem, this also happens for comet.
Command used and terminal output
No response
Relevant files
No response
System information
No response
https://github.com/OpenMS/OpenMS/blob/079143800f7ed036a7c68ea6e124fe4f5cfc9569/src/topp/MSGFPlusAdapter.cpp#L166 according to this comment in our adapter it is only used if no charge is annotated in the mzML
@jpfeuffer @timosachsenberg would it make sense to add a parameter to filter the psms in that charge range?
good question. I think these high charge peptides are potentially interesting so one could argue that one wants them to be reported. On the other hand you get more defined / consistent results without filtering. I would probably keep them by default but I could add an optional filter if we decide that we want to filter them