OpenMS icon indicating copy to clipboard operation
OpenMS copied to clipboard

Storing and exporting non-Unimod modifications (e.g. from MSFragger)

Open Matthias313 opened this issue 8 years ago • 30 comments
trafficstars

Dear all, I tried to use the MzTabExporter tool after the IDfilter which resulted in an error message, see below:

14:18:26 NOTICE: MzTabExporter of node #12 started. Processing ...
exporting identifications: "C:/Users/MATTHI~1/AppData/Local/Temp/6/2017-07-18_141154_ukmz1005immz_18988_1/TOPPAS_tmp/msfragger/007_IDFilter/out/MFA174.idXML" to mzTab: 
exporting identifications: "C:/Users/MATTHI~1/AppData/Local/Temp/6/2017-07-18_141154_ukmz1005immz_18988_1/TOPPAS_tmp/msfragger/007_IDFilter/out/MFA175.idXML" to mzTab: 
exporting identifications: "C:/Users/MATTHI~1/AppData/Local/Temp/6/2017-07-18_141154_ukmz1005immz_18988_1/TOPPAS_tmp/msfragger/007_IDFilter/out/MFA173.idXML" to mzTab: 
Error: Unexpected internal error (Modification or Substitution identifier MUST NOT be null or empty in MzTabModification)

==============================================================================
14:18:30 ERROR: MzTabExporter failed!

Currently, we use OpenMS2.1 and @lars20070 told me that this might have been fixed in the latest version (OpenMS2.2). If this is the case, we would start to update our version, otherwise it would be helpful if this issue could be fixed.

Furthermore, lars suggested that the problem lies within this code: https://github.com/OpenMS/OpenMS/blob/develop/src/openms/source/FORMAT/MzTab.cpp#L275

Matthias313 avatar Jul 18 '17 14:07 Matthias313

@timosachsenberg Does this exception look familiar? Any idea why the mod_identifier_ could be null? I will check tomorrow on develop.

lars20070 avatar Jul 18 '17 19:07 lars20070

@Matthias313 Here the probable reason for the crash.

lars@SchillingLinux:~/Desktop/test$ IDFileConverter -in MFA175.pepXML -out MFA175_NEW.idXML
Progress of 'Loading...':
Non-fatal error while loading 'MFA175.pepXML': No modification description given. Trying to define by modification mass.
Non-fatal error while loading 'MFA175.pepXML': Modification '147.0354' is not uniquely defined by the given data. Using 'Oxidation (M)' to represent any of 'Oxidation (M), oxidation to L-methionine sulfoxide (M), oxidation to L-methionine (R)-sulfoxide (M), oxidation to L-methionine (S)-sulfoxide (M)'.
Non-fatal error while loading 'MFA175.pepXML': No modification description given. Trying to define by modification mass.> occurred 2 times
Non-fatal error while loading 'MFA175.pepXML': Cannot find modification '129.079' of residue I at position 1 in 'IPPADSLLK'
Non-fatal error while loading 'MFA175.pepXML': Cannot find modification '202.0413' of residue C at position 1 in 'CDFTEDQTAEFK'
Non-fatal error while loading 'MFA175.pepXML': Cannot find modification '145.0375' of residue E at position 1 in 'EDSTSPKQEKENQEELGETR'
<Non-fatal error while loading 'MFA175.pepXML': Cannot find modification '202.0413' of residue C at position 1 in 'CDFTEDQTAEFK'> occurred 3 times
Non-fatal error while loading 'MFA175.pepXML': Cannot find modification '145.0375' of residue E at position 1 in 'ESGQPARRIAMAPLLEYER'
-- done [took 1.16 s (CPU), 1.18 s (Wall)] -- 
Progress of 'Storing...':
-- done [took 0.22 s (CPU), 0.21 s (Wall)] -- 
IDFileConverter took 1.40 s (wall), 1.38 s (CPU), 0.00 s (system), 1.38 s (user).
lars@SchillingLinux:~/Desktop/test$ MzTabExporter -in MFA175_NEW.idXML -out MFA175_NEW.mzTab
exporting identifications: "MFA175_NEW.idXML" to mzTab: 
Error: Unexpected internal error (Modification or Substitution identifier MUST NOT be null or empty in MzTabModification)

The pepXML MSFragger output include a number of unknown mods such as 202.0413. IDFileConverter puts null into these places which subequently leads to a crash in MzTabExporter.

lars20070 avatar Jul 19 '17 12:07 lars20070

yes seems as if special handling of unknown modifications would need to be added for MSFragger support

timosachsenberg avatar Jul 19 '17 12:07 timosachsenberg

MSFragger output MFA175.pepXML.

lars20070 avatar Jul 19 '17 12:07 lars20070

Nothing close to 202.0413 is in UniMod. We need to talk to the MSFragger developers what this is. Only then can we make full use of the MSFragger output.

lars20070 avatar Jul 19 '17 12:07 lars20070

I think this is a feature of open mass searches - looking for unknown modifications

timosachsenberg avatar Jul 19 '17 13:07 timosachsenberg

Unknown mods are not allowed in mzTab, right? So we cannot report full MSFragger results in mzTab.

lars20070 avatar Jul 19 '17 13:07 lars20070

see the specification document: CHEMMOD:+123.4567 defines an unnamed modification with delta mass of 123.4567

timosachsenberg avatar Jul 19 '17 13:07 timosachsenberg

I recently worked on unknown masses to get it to work at least inside OpenMS, so this should be supported from the OpenMS side at least: #2530

hroest avatar Jul 20 '17 15:07 hroest

V(MOD:00756)KLGC(Carbamidomethyl)SFSGKP is the culprit.

The sequence corresponds to this section in pepXML.

<search_hit peptide="VKLGCSFSGKP" massdiff="0.0012" calc_neutral_pep_mass="1194.6066" peptide_next_aa="G" num_missed_cleavages="1" num_tol_term="1" num_tot_proteins="1" tot_num_ions="20" hit_rank="1" num_matched_ions="7" protein="sp|Q9NYX4|CALY_HUMAN Neuron-specific vesicular protein calcyon OS=Homo sapiens GN=CALY PE=1 SV=1" peptide_prev_aa="-" is_rejected="0">
<modification_info>
<mod_aminoacid_mass mass="115.0633" position="1"/>
<mod_aminoacid_mass mass="160.0307" position="5"/>
</modification_info>
<search_score name="hyperscore" value="13.487"/>
<search_score name="nextscore" value="11.096"/>
<search_score name="expect" value="5.035e-02"/>
</search_hit>

lars20070 avatar Jul 21 '17 09:07 lars20070

During pepXML to idXML conversion, 115.0633 turns to MOD:00756. Why?

lars20070 avatar Jul 21 '17 10:07 lars20070

Guess it did match a Unimod mass and residue

timosachsenberg avatar Jul 21 '17 10:07 timosachsenberg

The only close mass is 115.0667 of Unimod #866. But in this case I would expect ICPL:13C(6)2H(4) and not MOD:00756.

@hendrikweisser @cbielow Any comments from the pepXML experts?

lars20070 avatar Jul 21 '17 10:07 lars20070

MOD:00756 is 4-hydroxy-D-valine with a mass of 115.063329 from the PSI-MOD

enetz avatar Jul 21 '17 11:07 enetz

@timosachsenberg Guess this is it. https://github.com/OpenMS/OpenMS/blob/develop/src/topp/MzTabExporter.cpp#L742

MOD:00756 is a PSI-MOD (thanks @enetz), but mzTab allows only Unimod.

lars20070 avatar Jul 21 '17 14:07 lars20070

where does it say that only Unimod is allowed?

timosachsenberg avatar Jul 21 '17 14:07 timosachsenberg

maybe because of that line? "// MzTab standard is to just report Unimod accession."

Matthias313 avatar Jul 21 '17 14:07 Matthias313

maybe we should remove matching against non-unimod by defauly alltogether?

hroest avatar Jul 21 '17 16:07 hroest

Ok I see. According to the specification it's more a recommendation. Probably best to stick to Unimod only for now and use chemmods for the other cases.

timosachsenberg avatar Jul 21 '17 19:07 timosachsenberg

+1

Once this is fixed, we continue with testing MSFragger in the OpenMS framework. We would need the complete MSFragger output in mzTab including mods which are neither Unimod nor PSI-Mod. All our post-processing is mzTab-based.

lars20070 avatar Jul 23 '17 08:07 lars20070

Ok I think would you would need to do is check all occurrences of getUniModAccession.

e.g.: https://github.com/OpenMS/OpenMS/blob/develop/src/topp/MzTabExporter.cpp#L238 can just add a CHEMMOD CvParam if the current modification if it has no UniModAccession. Similar at the other positions, e.g.: https://github.com/OpenMS/OpenMS/blob/develop/src/topp/MzTabExporter.cpp#L743 you would not use a CvParam but just setModificationIdentifer to "CHEMMOD:"+mod.getDiffMonoMass()

timosachsenberg avatar Jul 23 '17 08:07 timosachsenberg

@timosachsenberg thanks for all your efforts on this! =) How can I check all the occurrences of getUniModAccessions? Do I need to alter the code somehow?

Matthias313 avatar Jul 27 '17 08:07 Matthias313

Yes. Unfortunately some programming is required. Can't promise when I get to work on this. Got quite some stuff on my table and we officially don't support msfragger yet.

timosachsenberg avatar Jul 27 '17 09:07 timosachsenberg

There's a parameter mod_tol_ in PepXMLFile that controls the mass tolerance for looking up modifications. Setting it to a smaller value may be a quick workaround for this specific case. (You'd have to change it in the PepXMLFile source code and recompile OpenMS.)

hendrikweisser avatar Jul 31 '17 13:07 hendrikweisser

just to add another perspective, we also support non-unimod and user-defined mods in other formats, there is also a call bool ResidueModification::isUserDefined() to check for this.

I suggest due to the nature of how MSFragger works to even force every mod to be user-defined simply because we do not want to make any assumptions about what the mod is. This is due to how MSFragger reports mods: it will report a distribution of masses, e.g. for 115.063329 there will be some peptides that have 10ppm more and some that have 10ppm less. Our assumptions in pep.xml so far have been that the search engine has a list of allowed masses and it writes them out accurately but they will always be the same actual mass. In this case that is not the case and it would be very, very unfortunate if some masses e.g. 115.061 - 115.062 map to MOD:123 and then 115.063-115.064 map to MOD:456 and everything in between does not match at all. If we do that, we actually loose the information of the accurate mass match and the distribution!

hroest avatar Oct 22 '17 10:10 hroest

Just wanted to report encountering the same error when analyzing a bunch of LFQ files using XTandem (no custom mods - but XTandem seems to be able to be somewhat creative and report peptides not strictly in the search space, could this be the reason?) For each file, after filtering for FDR, I run FFCentroided on a seed list, then apply IDMapper. After the loop, I apply MapAlignerPoseClustering > FeatureLinkerUnlabeledQT > ConsensusMapNormalizer > IDConflictResolver > MzTabExporter ; this is on OpenMS 2.4.0 in Knime.

NB: I have to export my results using mzTabExporter since it seems to be the only way to recover linked peptides and features. Alternatively, I would be happy to recover a table of features with sequence information, of peptides with LFQ data, or just one table each but with cross-referencing ID columns linking them both. Sounds like this would be a standard task

Arthfael avatar Jul 04 '19 09:07 Arthfael

@Arthfael:

Alternatively, I would be happy to recover a table of features with sequence information, of peptides with LFQ data, or just one table each but with cross-referencing ID columns linking them both.

For the first option, try TextExporter with the "consensus:features" output (instead of "out"). For the second option, try running ProteinQuantifier instead of MzTabExporter.

hendrikweisser avatar Jul 04 '19 10:07 hendrikweisser

For the first option, try TextExporter with the "consensus:features" output (instead of "out"). For the second option, try running ProteinQuantifier instead of MzTabExporter.

Thank you. I found out in the mean time that my features and peptides were correctly exported with linking IDs... but I was looking at the wrong file. I was not properly understanding the way nodes pass output to each other, sorry for wasting your time.

Arthfael avatar Jul 04 '19 11:07 Arthfael

@tillenglert you have some experience with the MSFraggerAdapter. Any idea if this issue still persists?

timosachsenberg avatar Jul 21 '22 11:07 timosachsenberg

@timosachsenberg So far, I have always used the Philosopher post-processing tools for open search, and reported the modifications without OpenMS. I can try to follow up on this issue with some tests and get back to you!

tillenglert avatar Jul 22 '22 06:07 tillenglert