mygene.info icon indicating copy to clipboard operation
mygene.info copied to clipboard

generate a data schema and allow returned JSON objects follow strict schema

Open newgene opened this issue 5 years ago • 1 comments

A more "formal" solution to #42 and #7:

  • generate a JSON-schema style schema on the gene object
    • there could be two versions of schemas available:
      • one is faithfully representing the actual data objects (e.g. some field should allow both string and list of strings)
      • one is "strict schema", every field can have only one type (e.g. any field with even small chance to be a list will be defined as a list)
    • schemas can be accessible at /metadata/schema endpoint
  • by default, the gene objects are still returned as they are stored
  • With an optional "strict_schema=true" parameter, the returned gene objects will follow the "strict schema"

newgene avatar Aug 29 '18 18:08 newgene

Strong +1 to consistent return types. strict_schema=true would greatly improve the user experience. I recently wrote up some notes on MyGene. Copying the relevant section below:

MyGene doesn't have a schema for its gene output format (https://github.com/biothings/mygene.info/issues/52). Several fields return inconsistent types (https://github.com/biothings/mygene.info/issues/42), and its not clear what fields have the possibility to return a list as opposed to a scaler. Two options exist to opt-into consistent return types: always_list and allow_null, but these must be explicitly applied to columns without knowing a priori which columns need them.

dhimmel avatar Aug 20 '20 17:08 dhimmel