cobrapy
cobrapy copied to clipboard
Consistent output multiple-value elements as array of string in JSON/Yaml format.
Problem description
Here I have a suggestion regarding the JSON/Yaml scheme.
Specifically, I suggest to output the fields that have multiple elements (e.g. metChEBIID, rxnECNumbers) as array of string consistently. Now in the current scheme, this condition is not clearly specified. And values are outputted as string when there is just one element (Example1), while as array of string when there are multiple (Example2). This inconsistency holds back the auto parsing of JSON/Yaml files.
Example1:
- annotation: !!omap
- ec-code: 1.1.1.4
Example2:
- annotation: !!omap
- ec-code:
- 1.1.2.4
- 1.1.99.-
The expected implementation would be consistently printing out the elements with one or more values as array of string (Example3) regardless the frequency of occurrences. This won't introduce many addtional lines, but will enable the using of public Yaml parser and promte the applications of JSON/Yaml fromats.
Example3:
- annotation: !!omap
- ec-code:
- 1.1.1.4
- annotation: !!omap
- ec-code:
- 1.1.2.4
- 1.1.99.-
Although the SBML specification allows arbitrary XML for inside of annotations, I can definitely see this change making parsing JSON/YAML easier.
Any comments @zakandrewking since you use the JSON format a lot? @matthiaskoenig from the SBML side? Would this negatively affect a model read in via JSON and then written to SBML?
I suggest extending the COBRApy JSON-schema for models to explicitly define the data structure. That will make it much easier for users to parse & validate. As long as the new fields are marked as optional, there shouldn't be any issues with backwards compatibility.
(I came across this just now; is it related? https://github.com/opencobra/schema)
https://github.com/opencobra/schema was intended for discussing XML declarations that could extend the SBML schema and be used by all tools in the opencobra organization (and beyond, of course). However, it might make sense to jointly develop a JSON schema there, too, rather than storing it in cobrapy.
Do you want to start an issue for that there? I am happy to move the schema out of cobrapy and develop one that can be validated with JSON schema draft-04 (because that is the version the Python package jsonschema supports).
Depends on the goals for the schema project. I have no problem with it being inside of COBRApy, since that's the source of those files
Now tracked in the new annotation format PRs.