OpenMS
OpenMS copied to clipboard
Storing and exporting non-Unimod modifications (e.g. from MSFragger)
Dear all, I tried to use the MzTabExporter tool after the IDfilter which resulted in an error message, see below:
14:18:26 NOTICE: MzTabExporter of node #12 started. Processing ...
exporting identifications: "C:/Users/MATTHI~1/AppData/Local/Temp/6/2017-07-18_141154_ukmz1005immz_18988_1/TOPPAS_tmp/msfragger/007_IDFilter/out/MFA174.idXML" to mzTab:
exporting identifications: "C:/Users/MATTHI~1/AppData/Local/Temp/6/2017-07-18_141154_ukmz1005immz_18988_1/TOPPAS_tmp/msfragger/007_IDFilter/out/MFA175.idXML" to mzTab:
exporting identifications: "C:/Users/MATTHI~1/AppData/Local/Temp/6/2017-07-18_141154_ukmz1005immz_18988_1/TOPPAS_tmp/msfragger/007_IDFilter/out/MFA173.idXML" to mzTab:
Error: Unexpected internal error (Modification or Substitution identifier MUST NOT be null or empty in MzTabModification)
==============================================================================
14:18:30 ERROR: MzTabExporter failed!
Currently, we use OpenMS2.1 and @lars20070 told me that this might have been fixed in the latest version (OpenMS2.2). If this is the case, we would start to update our version, otherwise it would be helpful if this issue could be fixed.
Furthermore, lars suggested that the problem lies within this code: https://github.com/OpenMS/OpenMS/blob/develop/src/openms/source/FORMAT/MzTab.cpp#L275
@timosachsenberg Does this exception look familiar? Any idea why the mod_identifier_ could be null? I will check tomorrow on develop.
@Matthias313 Here the probable reason for the crash.
lars@SchillingLinux:~/Desktop/test$ IDFileConverter -in MFA175.pepXML -out MFA175_NEW.idXML
Progress of 'Loading...':
Non-fatal error while loading 'MFA175.pepXML': No modification description given. Trying to define by modification mass.
Non-fatal error while loading 'MFA175.pepXML': Modification '147.0354' is not uniquely defined by the given data. Using 'Oxidation (M)' to represent any of 'Oxidation (M), oxidation to L-methionine sulfoxide (M), oxidation to L-methionine (R)-sulfoxide (M), oxidation to L-methionine (S)-sulfoxide (M)'.
Non-fatal error while loading 'MFA175.pepXML': No modification description given. Trying to define by modification mass.> occurred 2 times
Non-fatal error while loading 'MFA175.pepXML': Cannot find modification '129.079' of residue I at position 1 in 'IPPADSLLK'
Non-fatal error while loading 'MFA175.pepXML': Cannot find modification '202.0413' of residue C at position 1 in 'CDFTEDQTAEFK'
Non-fatal error while loading 'MFA175.pepXML': Cannot find modification '145.0375' of residue E at position 1 in 'EDSTSPKQEKENQEELGETR'
<Non-fatal error while loading 'MFA175.pepXML': Cannot find modification '202.0413' of residue C at position 1 in 'CDFTEDQTAEFK'> occurred 3 times
Non-fatal error while loading 'MFA175.pepXML': Cannot find modification '145.0375' of residue E at position 1 in 'ESGQPARRIAMAPLLEYER'
-- done [took 1.16 s (CPU), 1.18 s (Wall)] --
Progress of 'Storing...':
-- done [took 0.22 s (CPU), 0.21 s (Wall)] --
IDFileConverter took 1.40 s (wall), 1.38 s (CPU), 0.00 s (system), 1.38 s (user).
lars@SchillingLinux:~/Desktop/test$ MzTabExporter -in MFA175_NEW.idXML -out MFA175_NEW.mzTab
exporting identifications: "MFA175_NEW.idXML" to mzTab:
Error: Unexpected internal error (Modification or Substitution identifier MUST NOT be null or empty in MzTabModification)
The pepXML MSFragger output include a number of unknown mods such as 202.0413. IDFileConverter puts null into these places which subequently leads to a crash in MzTabExporter.
yes seems as if special handling of unknown modifications would need to be added for MSFragger support
MSFragger output MFA175.pepXML.
Nothing close to 202.0413 is in UniMod. We need to talk to the MSFragger developers what this is.
Only then can we make full use of the MSFragger output.
I think this is a feature of open mass searches - looking for unknown modifications
Unknown mods are not allowed in mzTab, right?
So we cannot report full MSFragger results in mzTab.
see the specification document: CHEMMOD:+123.4567 defines an unnamed modification with delta mass of 123.4567
I recently worked on unknown masses to get it to work at least inside OpenMS, so this should be supported from the OpenMS side at least: #2530
V(MOD:00756)KLGC(Carbamidomethyl)SFSGKP is the culprit.
The sequence corresponds to this section in pepXML.
<search_hit peptide="VKLGCSFSGKP" massdiff="0.0012" calc_neutral_pep_mass="1194.6066" peptide_next_aa="G" num_missed_cleavages="1" num_tol_term="1" num_tot_proteins="1" tot_num_ions="20" hit_rank="1" num_matched_ions="7" protein="sp|Q9NYX4|CALY_HUMAN Neuron-specific vesicular protein calcyon OS=Homo sapiens GN=CALY PE=1 SV=1" peptide_prev_aa="-" is_rejected="0">
<modification_info>
<mod_aminoacid_mass mass="115.0633" position="1"/>
<mod_aminoacid_mass mass="160.0307" position="5"/>
</modification_info>
<search_score name="hyperscore" value="13.487"/>
<search_score name="nextscore" value="11.096"/>
<search_score name="expect" value="5.035e-02"/>
</search_hit>
During pepXML to idXML conversion, 115.0633 turns to MOD:00756.
Why?
Guess it did match a Unimod mass and residue
The only close mass is 115.0667 of Unimod #866.
But in this case I would expect ICPL:13C(6)2H(4) and not MOD:00756.
@hendrikweisser @cbielow Any comments from the pepXML experts?
MOD:00756 is 4-hydroxy-D-valine with a mass of 115.063329 from the PSI-MOD
@timosachsenberg Guess this is it. https://github.com/OpenMS/OpenMS/blob/develop/src/topp/MzTabExporter.cpp#L742
MOD:00756 is a PSI-MOD (thanks @enetz), but mzTab allows only Unimod.
where does it say that only Unimod is allowed?
maybe because of that line? "// MzTab standard is to just report Unimod accession."
maybe we should remove matching against non-unimod by defauly alltogether?
Ok I see. According to the specification it's more a recommendation. Probably best to stick to Unimod only for now and use chemmods for the other cases.
+1
Once this is fixed, we continue with testing MSFragger in the OpenMS framework. We would need the complete MSFragger output in mzTab including mods which are neither Unimod nor PSI-Mod. All our post-processing is mzTab-based.
Ok I think would you would need to do is check all occurrences of getUniModAccession.
e.g.: https://github.com/OpenMS/OpenMS/blob/develop/src/topp/MzTabExporter.cpp#L238 can just add a CHEMMOD CvParam if the current modification if it has no UniModAccession. Similar at the other positions, e.g.: https://github.com/OpenMS/OpenMS/blob/develop/src/topp/MzTabExporter.cpp#L743 you would not use a CvParam but just setModificationIdentifer to "CHEMMOD:"+mod.getDiffMonoMass()
@timosachsenberg thanks for all your efforts on this! =) How can I check all the occurrences of getUniModAccessions? Do I need to alter the code somehow?
Yes. Unfortunately some programming is required. Can't promise when I get to work on this. Got quite some stuff on my table and we officially don't support msfragger yet.
There's a parameter mod_tol_ in PepXMLFile that controls the mass tolerance for looking up modifications. Setting it to a smaller value may be a quick workaround for this specific case. (You'd have to change it in the PepXMLFile source code and recompile OpenMS.)
just to add another perspective, we also support non-unimod and user-defined mods in other formats, there is also a call bool ResidueModification::isUserDefined() to check for this.
I suggest due to the nature of how MSFragger works to even force every mod to be user-defined simply because we do not want to make any assumptions about what the mod is. This is due to how MSFragger reports mods: it will report a distribution of masses, e.g. for 115.063329 there will be some peptides that have 10ppm more and some that have 10ppm less. Our assumptions in pep.xml so far have been that the search engine has a list of allowed masses and it writes them out accurately but they will always be the same actual mass. In this case that is not the case and it would be very, very unfortunate if some masses e.g. 115.061 - 115.062 map to MOD:123 and then 115.063-115.064 map to MOD:456 and everything in between does not match at all. If we do that, we actually loose the information of the accurate mass match and the distribution!
Just wanted to report encountering the same error when analyzing a bunch of LFQ files using XTandem (no custom mods - but XTandem seems to be able to be somewhat creative and report peptides not strictly in the search space, could this be the reason?) For each file, after filtering for FDR, I run FFCentroided on a seed list, then apply IDMapper. After the loop, I apply MapAlignerPoseClustering > FeatureLinkerUnlabeledQT > ConsensusMapNormalizer > IDConflictResolver > MzTabExporter ; this is on OpenMS 2.4.0 in Knime.
NB: I have to export my results using mzTabExporter since it seems to be the only way to recover linked peptides and features. Alternatively, I would be happy to recover a table of features with sequence information, of peptides with LFQ data, or just one table each but with cross-referencing ID columns linking them both. Sounds like this would be a standard task
@Arthfael:
Alternatively, I would be happy to recover a table of features with sequence information, of peptides with LFQ data, or just one table each but with cross-referencing ID columns linking them both.
For the first option, try TextExporter with the "consensus:features" output (instead of "out"). For the second option, try running ProteinQuantifier instead of MzTabExporter.
For the first option, try TextExporter with the "consensus:features" output (instead of "out"). For the second option, try running ProteinQuantifier instead of MzTabExporter.
Thank you. I found out in the mean time that my features and peptides were correctly exported with linking IDs... but I was looking at the wrong file. I was not properly understanding the way nodes pass output to each other, sorry for wasting your time.
@tillenglert you have some experience with the MSFraggerAdapter. Any idea if this issue still persists?
@timosachsenberg So far, I have always used the Philosopher post-processing tools for open search, and reported the modifications without OpenMS. I can try to follow up on this issue with some tests and get back to you!