ga4gh-schemas icon indicating copy to clipboard operation
ga4gh-schemas copied to clipboard

Move datamodel to its own repo

Open david4096 opened this issue 7 years ago • 0 comments

As the Genomics API reaches a stable release, we would like to offer to the community the greatest value of the effort that the domain modelers have presented so far (that implementors have had time to integrate). Currently, this repository the data model, as well as the methods used to access that data model. Historically, this makes sense, as in order to develop an API there needs to be a domain to develop into.

To make sure that the Genomics API presented a complete API, the domain modelers designed methods that allowed all the data presented by a server to be reasoned about/accesed. However, there are often aspects of the data model that do not affect the methods. For example, adding a field to a Variant message that states whether it is an DUP, DEL, etc. does not affect how one inquires about these data. Although there may be methods introduced in the future, we gain little by tying the two horses together.

As a sign of maturity, I believe we can move the data model to its own repository, which in a way presents what the genomics data "are". Then, this repository, which defines the Genomics API, would simply depend on that data model. Other APIs and applications could be designed on the GA4GH data model, ensuring that application developers were interchanging standard genomic data.

This would allow applications to be compliant to the GA4GH data model, without having to work out the details of using the GA4GH methods.

This issue involves a small concerted effort and introduces no changes in functionality to this repository. It involves moving the proto that don't include methods to another repository and introducing the python and java build paths and dependency links between the two. Then, issues that don't affect the methods would be moved to the other repository. If the benefits of this approach are realized we might consider splitting the datamodel into separate repositories for each domain. (ga4gh/variants-schemas)

There are a number of benefits to this approach:

  • Better outside engagement. Folks will have a better time getting a review of their code if it is the only proto in a repository. Domain experts will be able to focus on their field and not be distracted by the other proto.
  • Better separation of concerns. The proto datamodel and methods, once separated, will enable our software to more sensibly "fit in" to other's stacks'
  • Easier composing of API elements. To satisfy this functionality we'll create a way of easily writing, packaging, and importing proto between repositories. This will allow software that simply uses the data model to do so.
  • Isolate testing environments, having a variants data model alone will allow us to better present software that demonstrates the data model, apart from the concerns of HTTP.

Thanks @ejacox, your comments today really clarified the benefits of this approach for me.

david4096 avatar Mar 07 '17 01:03 david4096