mygene.info
mygene.info copied to clipboard
generate a data schema and allow returned JSON objects follow strict schema
A more "formal" solution to #42 and #7:
- generate a JSON-schema style schema on the gene object
- there could be two versions of schemas available:
- one is faithfully representing the actual data objects (e.g. some field should allow both string and list of strings)
- one is "strict schema", every field can have only one type (e.g. any field with even small chance to be a list will be defined as a list)
- schemas can be accessible at /metadata/schema endpoint
- there could be two versions of schemas available:
- by default, the gene objects are still returned as they are stored
- With an optional "strict_schema=true" parameter, the returned gene objects will follow the "strict schema"
Strong +1 to consistent return types. strict_schema=true
would greatly improve the user experience. I recently wrote up some notes on MyGene. Copying the relevant section below:
MyGene doesn't have a schema for its gene output format (https://github.com/biothings/mygene.info/issues/52). Several fields return inconsistent types (https://github.com/biothings/mygene.info/issues/42), and its not clear what fields have the possibility to return a list as opposed to a scaler. Two options exist to opt-into consistent return types:
always_list
andallow_null
, but these must be explicitly applied to columns without knowing a priori which columns need them.