cobrapy icon indicating copy to clipboard operation
cobrapy copied to clipboard

Consistent output multiple-value elements as array of string in JSON/Yaml format.

Open haowang-bioinfo opened this issue 6 years ago • 4 comments

Problem description

Here I have a suggestion regarding the JSON/Yaml scheme.

Specifically, I suggest to output the fields that have multiple elements (e.g. metChEBIID, rxnECNumbers) as array of string consistently. Now in the current scheme, this condition is not clearly specified. And values are outputted as string when there is just one element (Example1), while as array of string when there are multiple (Example2). This inconsistency holds back the auto parsing of JSON/Yaml files.

Example1:

- annotation: !!omap
  - ec-code: 1.1.1.4

Example2:

- annotation: !!omap
  - ec-code:
    - 1.1.2.4
    - 1.1.99.-

The expected implementation would be consistently printing out the elements with one or more values as array of string (Example3) regardless the frequency of occurrences. This won't introduce many addtional lines, but will enable the using of public Yaml parser and promte the applications of JSON/Yaml fromats.

Example3:

- annotation: !!omap
  - ec-code:
    - 1.1.1.4

- annotation: !!omap
  - ec-code:
    - 1.1.2.4
    - 1.1.99.-

haowang-bioinfo avatar May 15 '18 21:05 haowang-bioinfo

Although the SBML specification allows arbitrary XML for inside of annotations, I can definitely see this change making parsing JSON/YAML easier.

Any comments @zakandrewking since you use the JSON format a lot? @matthiaskoenig from the SBML side? Would this negatively affect a model read in via JSON and then written to SBML?

Midnighter avatar Jun 13 '18 16:06 Midnighter

I suggest extending the COBRApy JSON-schema for models to explicitly define the data structure. That will make it much easier for users to parse & validate. As long as the new fields are marked as optional, there shouldn't be any issues with backwards compatibility.

(I came across this just now; is it related? https://github.com/opencobra/schema)

zakandrewking avatar Jun 16 '18 15:06 zakandrewking

https://github.com/opencobra/schema was intended for discussing XML declarations that could extend the SBML schema and be used by all tools in the opencobra organization (and beyond, of course). However, it might make sense to jointly develop a JSON schema there, too, rather than storing it in cobrapy.

Do you want to start an issue for that there? I am happy to move the schema out of cobrapy and develop one that can be validated with JSON schema draft-04 (because that is the version the Python package jsonschema supports).

Midnighter avatar Jun 16 '18 16:06 Midnighter

Depends on the goals for the schema project. I have no problem with it being inside of COBRApy, since that's the source of those files

zakandrewking avatar Jun 20 '18 20:06 zakandrewking

Now tracked in the new annotation format PRs.

cdiener avatar Nov 04 '22 19:11 cdiener