cobrapy icon indicating copy to clipboard operation
cobrapy copied to clipboard

Metadata fbc3

Open akaviaLab opened this issue 2 years ago • 26 comments

Updated version of https://github.com/opencobra/cobrapy/pull/988. Tried to simplify and tidy the code in addition to merging. Modified according to @cdiener requests

  • Ignored changes in notes for now
  • renamed constructors a bit
  • added eq comparisons that build classes if necessary

Original description of #988 follows

Metadata Classes: A separate directory for handling metadata information is made inside cobra/core directory. Every object derived from SBase can have meta information like annotation (CVTerms), notes, history attached to it. The CVTerms class for storing externally linked resources to each component derived from SBase. This class is maintaining the new format annotation as well as old format annotation simultaneously, and both are kept synchronized with each other. Changing one will modify the other accordingly. This new class for annotation can handle any type of annotation data (be it the case of nested annotation or alternative annotation). It can read the old format as well as the new format annotation data from JSON and other formats. At the time of writing back the model, the new data format is used because it contains the complete data organized in the same way as SBML. The History class used for storing the history, validating dates, etc, is now attached to each component derived from SBase. The KeyValuePair class for storing key-value pairs, defined by fbc-v3. The last three metadata objects (i.e CVTerms, History, KeyValuePair) are present inside a single attribute of SBase (Object) class and can be accessed via object.annotation.cvterms, object.annotation.history and object.annotation.key_value_data attributes. Calling simply the annotation attribute (object.annotation) will return the annotation data in old format (making it backward compatible).

Group to JSON: The support of the group package is extended to JSON.

JSON schema v2: The version2 of JSON schema has been added which defines the new format annotation, history, key-value pair, notes, group package data, user-defined constraints data and basic SBML info. @akaviaLab comment - not sure if the v2 schema works or is done. Seems unclear from the code.

Issues Fixed fixes issue https://github.com/opencobra/cobrapy/issues/954: The support of group package has been extended to JSON and other formats. fixes issue https://github.com/opencobra/cobrapy/issues/856: The infinity values are also enabled for storing bounds. fixes issue https://github.com/opencobra/cobrapy/issues/810: The sbml_info storing basic information of SBML is written to JSON to store the basic SBML document information like packages, level, version, notes, annotation attached to the SBML component etc. fixes issue https://github.com/opencobra/cobrapy/issues/736: If annotation is present in the form of list of list, it is first modified and then data is read. fixes issue https://github.com/opencobra/cobrapy/issues/706: Single annotation resource are now put into a string. While reading from JSON also, if a single resource attribute is in the form of string, it is first fixed to list and then read into the model. fixes issue https://github.com/opencobra/cobrapy/issues/684: The complete metadata structure (CVTerms, Notes, History) has been redesigned with backward compatibility. Old format data is read, fixed and then written in new format metadata structures. The fbc-v3 "KeyValuePair" class is also a part of this new format annotation. Its corresponding class and its support to JSON and other format has been added. However, the SBML parser for it has to be updated when libsbml adds support for fbc-v3. fixes issue https://github.com/opencobra/cobrapy/issues/937: The annotation format has been updated, which is backward compatible too. JSON schema v1: The JSON schema version 1 is modified to resolve some pre-existing issues.

Tests Tests for all the newly implemented features are added to check the functionalities. A few old tests are also modified accordingly. Some tests which were initially marked 'xfail' are now working dew to modified formats.

  • [ ] add an entry to the next release

akaviaLab avatar Jun 19 '22 16:06 akaviaLab

@cdiener I think this is the reference http://co.mbine.org/specifications/sbml.level-3.version-2.core.release-2.pdf http://co.mbine.org/specifications/sbml.level-3.version-1.fbc.version-2.release-1.pdf

akaviaLab avatar Jun 19 '22 16:06 akaviaLab

Codecov Report

Attention: Patch coverage is 77.22543% with 197 lines in your changes are missing coverage. Please review.

Project coverage is 83.40%. Comparing base (178bb08) to head (dc8d600). Report is 116 commits behind head on devel.

:exclamation: Current head dc8d600 differs from pull request most recent head 988148d. Consider uploading reports for the commit 988148d to get more accurate results

Files Patch % Lines
src/cobra/core/metadata/cvterm.py 73.52% 41 Missing and 31 partials :warning:
src/cobra/io/dict.py 74.81% 21 Missing and 13 partials :warning:
src/cobra/io/sbml.py 72.80% 21 Missing and 13 partials :warning:
src/cobra/core/metadata/history.py 74.78% 17 Missing and 12 partials :warning:
src/cobra/core/metadata/custompairs.py 76.74% 8 Missing and 2 partials :warning:
src/cobra/core/metadata/metadata.py 91.57% 6 Missing and 2 partials :warning:
src/cobra/io/json.py 85.18% 2 Missing and 2 partials :warning:
src/cobra/core/object.py 71.42% 1 Missing and 1 partial :warning:
src/cobra/manipulation/annotate.py 60.00% 2 Missing :warning:
src/cobra/core/model.py 87.50% 1 Missing :warning:
... and 1 more
Additional details and impacted files
@@            Coverage Diff             @@
##            devel    #1237      +/-   ##
==========================================
- Coverage   84.33%   83.40%   -0.94%     
==========================================
  Files          66       72       +6     
  Lines        5508     6145     +637     
  Branches     1267     1417     +150     
==========================================
+ Hits         4645     5125     +480     
- Misses        557      659     +102     
- Partials      306      361      +55     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov-commenter avatar Jun 19 '22 16:06 codecov-commenter

@akaviaLab What you linked is FBC version 2. The features implemented here look like a new version of FBC (version 3) but even after some searching I can not find a definition for this version (3).

cdiener avatar Jun 20 '22 17:06 cdiener

@cdiener Why do you think it has to be in a new version? I think the title might be wrong (and it is sbml3, not fbc3), but the annotation feature (with controlled vocabulary, cvterms) appears in this link http://co.mbine.org/specifications/sbml.level-3.version-2.core.release-2.pdf page 100 and onward

The keyvaluepair could be useful for things that aren't in identifiers and need to be annotated like https://synonym.caltech.edu/documents/elaborations/miriam_annotation_syntax/#how-do-i-put-inchi-strings-in-annotations (Ignore the fact that InChi is now in identifiers, and look at the format). Right now, it is useful for noting it in COBRA, and we can probably modify the code for import/export later.

What about the features in the code is not described by the standards I linked?

On Mon., Jun. 20, 2022, 1:04 p.m. Christian Diener, < @.***> wrote:

@akaviaLab https://github.com/akaviaLab What you linked is FBC version 2. The features implemented here look like a new version of FBC (version 3) but even after some searching I can not find a definition for this version (3).

— Reply to this email directly, view it on GitHub https://github.com/opencobra/cobrapy/pull/1237#issuecomment-1160673055, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACQYYZVPRS5L7Q3ER4XCKFTVQCQDLANCNFSM5ZGT4REA . You are receiving this because you were mentioned.Message ID: @.***>

akaviaLab avatar Jun 20 '22 17:06 akaviaLab

Just based on previous discussions where it was mentioned that a new version of FBC would have key-value pairs as those are currently not supported/allowed in SBML. The CV terms (still no fan of the name) do indeed look like they are compatible with the current version of SBML so those could be merged immediately.

cdiener avatar Jun 20 '22 17:06 cdiener

Cvterms are accesed via structured as requested. Did I misunderstand you?

On Mon., Jun. 20, 2022, 1:51 p.m. Christian Diener, < @.***> wrote:

Just based on previous discussions where it was mentioned that a new version of FBC would have key-value pairs as those are currently not supported/allowed in SBML. The CV terms (still no fan of the name) do indeed look like they are compatible with the current version of SBML so those could be merged immediately.

— Reply to this email directly, view it on GitHub https://github.com/opencobra/cobrapy/pull/1237#issuecomment-1160708085, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACQYYZRZOIKIWGWZEPDQXIDVQCVSZANCNFSM5ZGT4REA . You are receiving this because you were mentioned.Message ID: @.***>

akaviaLab avatar Jun 20 '22 18:06 akaviaLab

No, no all good 👍

cdiener avatar Jun 20 '22 19:06 cdiener

This document mentions FBC v3 (section 4) and includes the key-value pairs. However, that seems to be a proposal and not a final consensus. Also, there seems to have been no updates since 2020...

cdiener avatar Jun 23 '22 18:06 cdiener

Hmm We can remove the keypairvalues while we discuss the format. Or just remove the code.

I have no strong feelings about this.

On Thu., Jun. 23, 2022, 2:33 p.m. Christian Diener, < @.***> wrote:

This document https://sourceforge.net/p/sbml/code/26312/tree/trunk/specifications/sbml-level-3/version-1/fbc/spec/main.pdf?format=raw mentions FBC v3 and includes the key-value pairs. However, that seems to be a proposal and not a final consensus. Also, there seems to have been no updates since 2020...

— Reply to this email directly, view it on GitHub https://github.com/opencobra/cobrapy/pull/1237#issuecomment-1164740264, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACQYYZUVPZZUTR6VHU75ZMDVQSUWNANCNFSM5ZGT4REA . You are receiving this because you were mentioned.Message ID: @.***>

akaviaLab avatar Jun 23 '22 22:06 akaviaLab

I actually would leave it in. It is very convenient, and we could wait with writing it to SBML until FBC3 is adopted. I think this PR is in a pretty good state. I'll try my best to get to a review.

cdiener avatar Jun 24 '22 17:06 cdiener

@matthiaskoenig can you help review this?

akaviaLab avatar Jun 29 '22 15:06 akaviaLab

No need to try fixing 3.6 issues. We should probably drop support for it anyway. https://endoflife.date/python

cdiener avatar Jul 08 '22 14:07 cdiener

Okay. I'll keep that in mind for next issues. Is this what you meant with the creator for a simpler API?

akaviaLab avatar Jul 08 '22 14:07 akaviaLab

Yep, pretty much. I would probably allow the dates to be provided directly as string and have init convert it to datetime. And maybe have some things initialized to default dates. For instance, the creation date could just be the current time by default. Might also be helpful if the creator info can be provided in the configuration and is then used by default.

cdiener avatar Jul 08 '22 17:07 cdiener

Yep, pretty much. I would probably allow the dates to be provided directly as string and have init convert it to datetime. And maybe have some things initialized to default dates. For instance, the creation date could just be the current time by default. Might also be helpful if the creator info can be provided in the configuration and is then used by default.

Yep, pretty much. I would probably allow the dates to be provided directly as string and have init convert it to datetime. And maybe have some things initialized to default dates. For instance, the creation date could just be the current time by default. Might also be helpful if the creator info can be provided in the configuration and is then used by default.

History accepts str or datetime as a format for the dates. parse_datetime parses string, and deals with datetime appropriately So that seems to do what you described already, unless I misunderstood

Creation time now() by default - I can do that. Creator in config - reasonable, but next PR.

akaviaLab avatar Jul 12 '22 01:07 akaviaLab

Hi all, I could review/update the code.

For clarification with the key/value store. This is basically the new fbc-version3 package which just needs a second implementation (which is this implementation) to be officially accepted. I.e. this is basically official as soon as we merge this in develop. So the key-value pair has to stay in.

A lot of discussion here. Are there any open things which have to be decided or is everything down to implementation details at this point? Best Matthias

matthiaskoenig avatar Jul 13 '22 08:07 matthiaskoenig

Hi Matthias, As far as I remember, there are still some open discussions

  1. Should we rename CVTerm to something like standardized?
  2. Should creators have only one name, or stay with family name and given name? Should one name be an option? https://github.com/opencobra/cobrapy/pull/1237#discussion_r912185946
  3. Renaming the qualifiers, see https://github.com/opencobra/cobrapy/pull/1237#discussion_r912191819. I think they should stay as is.
  4. What should the JSON schema include? See https://github.com/opencobra/cobrapy/pull/1237#discussion_r912193394 and https://github.com/opencobra/cobrapy/discussions/1240#discussioncomment-3088901

The rest is just tidying, code implementation. I'm still working on it, but if you have comments now that would be helpful.

Thank you,

Uri David

On Wed, Jul 13, 2022 at 4:22 AM Matthias König @.***> wrote:

Hi all, I could review/update the code.

For clarification with the key/value store. This is basically the new fbc-version3 package which just needs a second implementation (which is this implementation) to be officially accepted. I.e. this is basically official as soon as we merge this in develop. So the key-value pair has to stay in.

A lot of discussion here. Are there any open things which have to be decided or is everything down to implementation details at this point? Best Matthias

— Reply to this email directly, view it on GitHub https://github.com/opencobra/cobrapy/pull/1237#issuecomment-1182918737, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACQYYZRBY2TPYODYFZSF4NLVTZ4DZANCNFSM5ZGT4REA . You are receiving this because you were mentioned.Message ID: @.***>

akaviaLab avatar Jul 13 '22 11:07 akaviaLab

Yep, pretty much. I would probably allow the dates to be provided directly as string and have init convert it to datetime. And maybe have some things initialized to default dates. For instance, the creation date could just be the current time by default. Might also be helpful if the creator info can be provided in the configuration and is then used by default.

Please don't set any current times! This will create diffs on your SBML models and model files every time you run your scripts resulting in large git diffs! I.e. don't set a current time as default. Perhaps an option for that, but please not the default behavior. The HistoryElement can be set on all core objects, so having diffs on all objects every time a script is executed is a bad idea.

matthiaskoenig avatar Jul 13 '22 15:07 matthiaskoenig

Right now, the document (libsbml.SBMLDocument), is set to be created at current time if there is no document creation time. Nothing else is created, because having diffs is a bad idea. No entities, and not the libsbml.Model entity either.

I can remove this, but this doesn't seem harmful to me.

On Wed, Jul 13, 2022 at 11:54 AM Matthias König @.***> wrote:

Yep, pretty much. I would probably allow the dates to be provided directly as string and have init convert it to datetime. And maybe have some things initialized to default dates. For instance, the creation date could just be the current time by default. Might also be helpful if the creator info can be provided in the configuration and is then used by default.

Please don't set any current times! This will create diffs on your SBML models and model files every time you run your scripts resulting in large git diffs! I.e. don't set a current time as default. Perhaps an option for that, but please not the default behavior. The HistoryElement can be set on all core objects, so having diffs on all objects every time a script is executed is a bad idea.

— Reply to this email directly, view it on GitHub https://github.com/opencobra/cobrapy/pull/1237#issuecomment-1183396689, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACQYYZUZKCSJJCE646NPQYDVT3RBVANCNFSM5ZGT4REA . You are receiving this because you were mentioned.Message ID: @.***>

akaviaLab avatar Jul 13 '22 16:07 akaviaLab

Okay. I think if this is what the community wants to do, we should define a standard of

  • What is a valid cobray ID?
  • How to deal with weird characters like :, that are currently converted to ASCII code. We can decide to have something like SBML_COLON, SBML_SEMICOLON (it might exist, I'm not sure)

And get it done before cobrapy 1.0

@cdiener @Midnighter @synchon - any comments?

On Wed, Jul 13, 2022 at 11:48 AM Matthias König @.***> wrote:

@.**** commented on this pull request.

In src/cobra/io/schema_v2.json https://github.com/opencobra/cobrapy/pull/1237#discussion_r920241365:

  •          }
    
  •        },
    
  •        "required": [
    
  •          "key"
    
  •        ],
    
  •        "additionalProperties": false
    
  •      }
    
  •    }
    
  •  }
    
  • }
  • },
  • "type": "object",
  • "properties": {
  • "id": {
  •  "type": "string",
    
  •  "pattern": "^[a-zA-Z|_][a-zA-Z0-9_]*$"
    

Yes, SIds are valid SBML ids. This can be checked very easily with a regex.

Like mentioned above the only id changes are on

  • JSON import (IDs are fixed to be valid SIds)
  • MAT import (IDS are fixed to be valid SIds)
  • OTHER_FORMAT_NOT_SBML (IDS are fixed to be valid SIds)

There would be a single function fixing ids, e.g. fix_invalid_ids used in both/all importers to have a consistent behavior across formats. Nothing would be replaced fixed on any exports, because SIds are valid JSON and MAT Ids.

— Reply to this email directly, view it on GitHub https://github.com/opencobra/cobrapy/pull/1237#discussion_r920241365, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACQYYZWNYJZ5L32SFOKOXDTVT3QONANCNFSM5ZGT4REA . You are receiving this because you were mentioned.Message ID: @.***>

akaviaLab avatar Jul 13 '22 16:07 akaviaLab

@matthiaskoenig

Key-value pairs will be in there. They are super useful and a regularly requested feature.

The current date was only supposed to use current time as default value for the constructor. So it would only be set the first time the history is created and only if the user does not specify a date. EDIT: see reply below. Only set on writin gif no date is set.

The other discussion was whether we should limit the format of IDs for the v2 JSON schema. I don't think there is a good solution here, since the SBML and COBRAPY ID requirements are not compatible. So it will come down to whether we want the JSON schema 100% SBML compliant, with the limitation that not all COBRAPY models can be saved to that format.

The rest is just naming of classes etc. to make it a bit more accessible to devs and users without dep knowledge of SBML abstractions.

cdiener avatar Jul 13 '22 19:07 cdiener

The current date was only supposed to use current time as default value for the constructor. So it would only be set the first time the history is created and only if the user does not specify a date. So only when explicitly doing model.annotation.history = History(...) the first time and when not providing the date explicitly in the constructor. We could skip this, but I suspect many users will do something like model.annotation.history = History(created_date=datetime.now(), ...). Right now, it will initialize to None if not specified. If you prefer that, we can leave it as is...

Right now, History() is called as part of every annotation, so modifying any default there would lead to every entity having a created date. That's a bad idea. sbml.py has a small section when writing a document - if that document (not the model, but the document) has no creation date, it will be set to current time.

elif isinstance(sbase, libsbml.SBMLDocument): time = datetime.datetime.now() timestr = time.strftime("%Y-%m-%dT%H:%M:%S%z") date = libsbml.Date(timestr) _check(comp_history.setCreatedDate(date), "set creation date for document")

That's can be removed or stay.

akaviaLab avatar Jul 13 '22 20:07 akaviaLab

I would highly prefer to have SBML SIds as valid cobrapy ids.

  • These can be serialized to all formats
  • These work in optlang and the solver interfaces
  • These are valid object identifiers (which is sometimes important)
  • These are very simple to validate with the following regex [a-zA-Z][a-zA-Z0-9_]*
  • Most models already use valid SIds because SBML is the de facto exchange standard

The only exceptions are some databases such as BiGG (no longer maintained, which also provides SBML with valid IDs).

These would allow:

  • to get rid of all code for id replacements besides on the importers for JSON/Matlab
  • to have no id replacements on SBML and valid cobrapy models (these create a ton of issues when mapping data/models, ...) and create a ton of bugs.

I strongly advocate for this solution and just enforce valid SIds and move id_replacement in helper functions which would be applied for import besides SBML (and as toolbox for people to fix there invalid models).

edit: fixed regex

matthiaskoenig avatar Jul 13 '22 21:07 matthiaskoenig

@cdiener @matthiaskoenig Is the documentation in af5f3fa for CVTerm and Qualifier better? What the documentation should be in general?

akaviaLab avatar Jul 14 '22 01:07 akaviaLab

@cdiener @matthiaskoenig Is the documentation in af5f3fa for CVTerm and Qualifier better? What the documentation should be in general?

Yes, the documentation is better. But please use a string enum for such enums. This will allow to parse strings much easier to the actual instances. I.e. something along the lines of

class MyEnum(str, Enum):
    state1 = 'state1'
    state2 = 'state2'

matthiaskoenig avatar Jul 14 '22 07:07 matthiaskoenig

I would highly prefer to have SBML SIds as valid cobrapy ids.

* These can be serialized to all formats

* These work in optlang and the solver interfaces

* These are valid object identifiers (which is sometimes important)

* These are very simple to validate with the following regex `[a-zA-Z][a-zA-Z0-9_]*`

* Most models already use valid SIds because SBML is the de facto exchange standard

I think this is hard because of backwards compatibility as mentioned above, and it would be a bit inconsistent for instance when reading models with JSON v1 schema. But I so see your point and agree that all the SBML-like formats IDs should be read as is and there should be helpers to convert them to that format. Point 2 is not fully correct since there are some IDs that are SBML-compliant but don't work in optlang in some solver interfaces (IDs longer 255 chars in GLPK, or metabolites named "st" in Gurobi for instance). So basically we push the defaults toward SBML-compliant IDs without breaking other formats such as Matlab etc.

cdiener avatar Jul 14 '22 19:07 cdiener