cobrapy
cobrapy copied to clipboard
Store SBML meta information (level, version, packages, provenance)
Hey @opencobra/cobrapy-core,
I'd like to store some meta information about the SBML that was parsed:
- level
- version
- packages used (fbc, annotation, groups)
The big question is where to store that information and I'd like your opinions. Current ideas, either
- Create a new attribute on the model
cobra.Model.sbml_info
that could be a tuple(level: int, version: int, packages: Tuple[str])
. - Create a new
cobra.Model.meta
attribute which would allow for some more general information later. It could be a dictionary andmodel.meta["SBML"]
could contain above information.
Curious what you think and if you have any other ideas about this :smiley:
Since it doesn't really belong to the cobra.Model
object, I was thinking something along the lines of returning a tuple if a specific flag was set on the parsing function:
model = read_sbml("path/to/model.xml")
model, tuple = read_sbml("path/to/model.xml", sbml_info=True)
Would that work too?
Yes, that's another consideration. In general, I think functions having varying return types are a pain in the neck and bad design and I'd like to avoid it in future but this may be an acceptable exception to the rule :wink:
I agree it doesn't belong to cobra.Model
since it might as well have been loaded from JSON. It should also be possible to extract this info from libsbml in cases where it's not able to build a model instance.
What about a helper function in io, e.g. cobra.io.read_sbml_info(), that does nothing but return the meta information?
Some further requirements to take into account:
- Some of the meta information might be desirable when writing a model back to SBML. In that case the model is the only logical place where this information can be stored. @matthiaskoenig can give a better picture on this.
- Even though in case of the version information, which is right in the header, it is cheap to do, I think in general it's not very desirable to go back and parse information again.
- Some of the meta information might be desirable when writing a model back to SBML. In that case the model is the only logical place where this information can be stored. @matthiaskoenig can give a better picture on this.
In that case, I'd prefer the dictionary approach (2.) of your original post.
Would parsing it and then storing it in some sort of global variable that is detached from the model object itself be a solution that is less of a pain in the neck and bad design? Something comparable to a Click context?
Such that:
model = read_sbml("path/to/model.xml")
creates both model
but also adds an entry to some sort of MODEL_REGISTRY
dictionary that exists for this session:
MODEL_REGISTRY[model<Object 1238452>] = {meta: Information}
When any of the cobra.io functions then encode the model as SBML or JSON they could default to the information that makes sense for that filetype i.e. writing to SMBL would retrieve
'info': 'SBML L3V1, fbc-v2, groups-v1',
'level': 3,
'packages': {'fbc': 2, 'groups': 1},
'version': 1}
but writing to JSON wouldn't use that information.
Just to add to the discussion: Yes, the information is important and would also be very helpful/useful for writing SBML models. Part of the information are the notes and annotations on the SBMLDocument, but also the ModelHistory information, i.e. who created the model. This is also information you would want to set on cobra models before writing, i.e. who created the model, when was it created and what are the notes and annotations on the SBMLDocument. Such provenance is crucial.
You want to have this information persistent with the model, so that it can be written on export again. Or at least some way to store model/document meta information. Writing a model without information on who created it an when is very bad style. This information should be part of the model. The information could look like this:
meta =
{'annotations': {'sbo': 'SBO:0000624'},
'created': 2016-10-05T13:59:23Z,
'creators': [{'familyName': 'König', 'givenName': Matthias,
'organisation': 'Humboldt University Berlin', 'email':
'[email protected]'}],
'info': 'SBML L3V1, fbc-v2, groups-v1',
'level': 3,
'notes': {},
'packages': {'fbc': 2, 'groups': 1},
'version': 1}
And this information should be written on the model, e.g. in a
model._sbmlmeta
field
Best M
On Thu, Feb 28, 2019 at 4:48 PM Moritz E. Beber [email protected] wrote:
Some further requirements to take into account:
- Some of the meta information might be desirable when writing a model back to SBML. In that case the model is the only logical place where this information can be stored. @matthiaskoenig https://github.com/matthiaskoenig can give a better picture on this.
- Even though in case of the version information, which is right in the header, it is cheap to do, I think in general it's not very desirable to go back and parse information again.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/opencobra/cobrapy/issues/810#issuecomment-468323525, or mute the thread https://github.com/notifications/unsubscribe-auth/AA29ugGM45yCi_HX0cKEzr3p-tJcQyK5ks5vR_phgaJpZM4bWfnG .
-- Matthias König, PhD. Junior Group Leader LiSyM - Systems Medicine of the Liver Humboldt Universität zu Berlin, Institute of Biology, Institute for Theoretical Biology https://livermetabolism.com [email protected] https://twitter.com/konigmatt https://github.com/matthiaskoenig Tel: +49 30 2093 98435
I would argue that information of this kind belongs to a cobra model since it specifies provenance. I agree that is should not be a SBML specific attribute though. First, cobra.Model
already has an annotations dictionary which is not used for much right now and could get a provenance entry. Alternatively we could add cobra.Model.provenance
which annotates how that model was obtained. For instance it could indicate the JSON schema version, the reconstruction method, etc. The SBML write can then pick which of that info it wants to use to write SBML. This also goes in line with what many workflow managers or other large projects (for instance Qiime 2) are doing.
Could you link to an example or documentation that shows this for Qiime 2? I don't know it but your reasoning sounds convincing to me.
There is some argumentation in https://docs.qiime2.org/2019.1/concepts/?highlight=provenance#data-files-qiime-2-artifacts, namely
Artifacts enable QIIME 2 to track, in addition to the data itself, the provenance of how the data came to be. With an artifact’s provenance, you can trace back to all previous analyses that were run to produce the artifact, including the input data used at each step. This automatic, integrated, and decentralized provenance tracking of data enables a researcher to archive artifacts, or for example, send an artifact to a collaborator, with the ability to understand exactly how the artifact was created. This enables replicability and reproducibility of analyses, as well as generation of diagrams and text that can be used in the methods section of a paper. Provenance also supports and encourages the proper attribution to underlying tools (e.g. FastTree to build a phylogenetic tree) used to generate the artifact.
Most of Qiime still works via the command line, but you can look at an example for provenance in the web visualization (clicking on the provenance tab on top)
Now tracked in #1237 and available as part of the history.