RAVEN icon indicating copy to clipboard operation
RAVEN copied to clipboard

fix: disable adding prefix 'M_' and 'R_' in exportModel function

Open edkerk opened this issue 4 years ago • 10 comments
trafficstars

Main improvements in this PR:

Duplicate of #356, reverting the revert by #359 :) (see explanation there).

The changes made in PR are based on the discussion and consensus in #353

  • stops adding prefixes M_, R_, and C_ to metabolite, reaction, and compartment ids in exportModel function (RAVEN doesn't correct but report errors)
  • introduces regular expressions to checkModelStruct for checking whether the IDs are in compliance with libSBML specification
  • fully runs checkModelStruct by default in exportModel and reports instances of invalid ids as errors (prevent their spreading to the community)

I hereby confirm that I have:

edkerk avatar Aug 13 '21 21:08 edkerk

Line 118-132 should be removed, no longer required as invalid IDs would already have thrown an error.

https://github.com/SysBioChalmers/RAVEN/blob/a459d6c10714b673878ad62790d6e19c68fefb7c/io/exportModel.m#L118-L132

edkerk avatar Sep 25 '21 19:09 edkerk

agree

haowang-bioinfo avatar Sep 26 '21 05:09 haowang-bioinfo

no longer required as invalid IDs would already have thrown an error.

@edkerk this got me thinking - are invalid reported throwing errors for all import formats, including importExcelModel?

mihai-sysbio avatar Sep 26 '21 07:09 mihai-sysbio

no longer required as invalid IDs would already have thrown an error.

@edkerk this got me thinking - are invalid reported throwing errors for all import formats, including importExcelModel?

@mihai-sysbio What I meant is that exportModel will throw the error via checkModelStruct. importModel would not need to call checkModelStruct, as libSBML would have failed to load a model with illegal characters. For importExcelModel though, and a future readYaml function (which at this time does not exist in RAVEN 2, but is planned to be included) it indeed would make sense to call checkModelStruct.

edkerk avatar Sep 27 '21 10:09 edkerk

Should not be merged before #364 is merged and pushed from devel to main. Release 2.5.4 has then minor changes, while release 2.6.0 (this PR) will have the significant change that I/O is no longer resulting in the same model in MATLAB as in previous versions, as it no longer uses the COBRA-style prefixes.

edkerk avatar Sep 28 '21 11:09 edkerk

Resolving the conflict and the merging of #364 brings this PR in a merge-able state.

Release 2.5.4 has then minor changes, while release 2.6.0 (this PR) will have the significant change that I/O is no longer resulting in the same model in MATLAB as in previous versions, as it no longer uses the COBRA-style prefixes.

I guess we then expect one more PR that releases 2.5.4. before merging this one though @edkerk ?

mihai-sysbio avatar Dec 06 '21 05:12 mihai-sysbio

I'm still hesitant with this, as it can break backwards compatibility with people's scripts. It changes the identifier representation (e.g. M_glc6p instead of glc6p) when changing RAVEN version. Might it not be so drastic that it should (part of a) major release?

edit: I moved my further discussion to issue #353.

edkerk avatar Dec 06 '21 15:12 edkerk

I tried to test this PR with the iAL1006 model from tutorials. Due to the suggested changes in this PR we have a new problem - there are many rxns and mets that begin with a numeric char so the model is no longer compliant with SBML specifications and the exportModel function therefore fails. I am not sure how common it is for metabolite and reaction IDs to start with a number, but this is definitely problematic for GEMs that use BiGG IDs for mets and/or rxns.

simas232 avatar Apr 24 '22 19:04 simas232

I agree @simas232, this should be properly resolved, also discussed in #353. To avoid confusion, I'll change this PR to draft for now.

edkerk avatar Apr 24 '22 20:04 edkerk

Good catch @simas232.

this is definitely problematic for GEMs that use BiGG IDs for mets and/or rxns.

To me, this sounds very much like a problem with the GEMs themselves not being SBML-compatible. The main question who's responsibility it is to address it.

I could imagine a version of the tutorial that starts with a Colab notebook with cobrapy that imports and exports the model, just to obtain a valid SBML (assuming cobrapy also implements this prefixing functionality). Alternatively, such a "patched" GEM could be delivered with the tutorial, as long as this is documented.

mihai-sysbio avatar Apr 25 '22 05:04 mihai-sysbio