specification icon indicating copy to clipboard operation
specification copied to clipboard

Schema dump for v.2.0.1?

Open pbonnell opened this issue 3 years ago • 12 comments

The current slq core schema dump here is v1.2. Does anyone have a clean v2.0.1 they could dump drop in there?

We're working on a direct implementation of the HSDS Transformer in an existing Rails app and looking at saving directly to a Postgres store for our adoption workflow. A canonical schema from this spec would speed things along nicely!

pbonnell avatar Apr 30 '21 15:04 pbonnell

I believe this is queued up over in /api-specification in #109. @kinlane has it on his roadmap; Kin, any updates?

@pbonnell there are also others working on upgrading the transformer, we might want y'all to be in touch - cc @devinbalkind @adityasrini

greggish avatar May 02 '21 19:05 greggish

We updated the Transformer to 2.0.1 last month - https://github.com/sarapis/hsds-transformer

It should output an HSDS zip file that follows the reference schema.

If you have any issues with it please post them directly to that GitHub repo. No one has used it in the wild yet so would love feedback and issues etc.

I also think, and have argued for a while, that a canonical sql schema is essential for moving this project forward. I’ll be happy to do whatever’s necessary to get it done.

On Sun, May 2, 2021 at 15:30 Greg Bloom @.***> wrote:

I believe this is queued up over in /api-specification https://github.com/openreferral/api-specification in #109 https://github.com/openreferral/api-specification/issues/109. @kinlane https://github.com/kinlane has it on his roadmap; Kin, any updates?

@pbonnell https://github.com/pbonnell there are also others working on upgrading the transformer, we might want y'all to be in touch - cc @devinbalkind https://github.com/devinbalkind @adityasrini https://github.com/adityasrini

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/openreferral/specification/issues/235#issuecomment-830859436, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADMPF6FN4MBEU6BZUP4O5LTLWR4RANCNFSM434QKX2Q .

-- Devin Balkind @devinbalkind devinbalkind.com

devinbalkind avatar May 02 '21 19:05 devinbalkind

This should be fairly straightforward to generate at build time from the schema, because HSDS is fundamentally tabular.

The build system is Sphinx, which means we can run arbitrary Python during the build. https://pypi.org/project/jsontableschema/ has an example of using sqlite to create a database from table schema, which we might then be able to export.

robredpath avatar May 26 '21 15:05 robredpath

Here is a SQL dump for v2.0.0, I will make a pass for 2.0.1 changes - https://gist.github.com/kinlane/99dbd718a7f50a85c00eb4dd44ca39f1

kinlane avatar Jun 16 '21 16:06 kinlane

I tried https://framework.frictionlessdata.io/docs/tutorials/formats/sql-tutorial/ and this looks good (We already use frictionless lib for testing the example data.)

First hurdle - it expects a full data package and we just have the datapackage.json file - but a quick bit of Python makes a full data package with no data in it:


import os
import json
import csv
import shutil

os.makedirs("blank_data_package", exist_ok=True)

shutil.copyfile("datapackage.json", os.path.join("blank_data_package", "datapackage.json"))

with open("datapackage.json") as fp:
    datapackage = json.load(fp)

    for table in datapackage.get('resources'):

        with open(os.path.join("blank_data_package", table['path']), 'w', newline='') as fp:
            writer = csv.writer(fp)
            writer.writerows([[i['name'] for i in table['schema']['fields']]])

Lets try Psql first (just cos I know it more):

pip install psycopg2 frictionless[sql]

Then a quick Python build file:

from frictionless import Package
package = Package('blank_data_package/datapackage.json')
package.to_sql('postgresql://postgres:PASSWORD@localhost:PORT/postgres')

Personally I ran Postgres via docker - https://hub.docker.com/_/postgres - but whatever you have handy.

Seems good! Get a set of tables with foreign keys and different column types.

It seems to use sqlalchemy so I have high hopes this will work with Mysql and others too.

odscjames avatar Oct 05 '21 14:10 odscjames

I ran official mysql via https://hub.docker.com/_/mysql (we should double check variants maybe?)

pip install mysqlclient==2.0.3

See https://pypi.org/project/mysqlclient/ as you may need some other libs first

Log in and create a database.

Change build line to:

package.to_sql('mysql://root:[email protected]:PORT/DATABASE')

Get crash:

sqlalchemy.exc.OperationalError: (MySQLdb._exceptions.OperationalError) (1170, "BLOB/TEXT column 'id' used in key specification without a key length")

Ok, fair point.

If I edit datapackage.json, and on the ID columns AND foreign key columns add a max length:


{
                            "name": "id",
                            "type": "string",
                            "description":"Each organization must have a unique identifier.",
                            "format":"uuid",
                            "constraints": {
                                "required":true,
                                "unique":true,
                                "maxLength": 255
                            }
                        },

It starts to work. I get tables created.

I notice the existing database/hsda_core.sql uses varchar(X) for id columns, with various lengths - 38 for organization for instance.

So I think it's acceptable to limit this - we just have to pick a number.

(This is maybe worth doing for postgres too? I just went back and looked and the postgres tables have TEXT as a primary ID which maybe isn't great.)

odscjames avatar Oct 05 '21 14:10 odscjames

Ok - here's a test repo for Postgres.

It has shell scripts that - with only docker installed - will build a database, dump it to a file and clean up your docker resources afterwards.

It has a github action that will build the schema and error if there is a diff - so we catch if someone changes the schema and does not follow the instructions to rebuild.

https://github.com/odscjames/test-gh-actions-database-dump/tree/postgres

Working action: https://github.com/odscjames/test-gh-actions-database-dump/actions/runs/1312258007

Broken action: https://github.com/odscjames/test-gh-actions-database-dump/actions/runs/1312266278

[ We just have to replace the CREATE TABLE stuff with the python stuff from previous comments to actually make the schema we want ]

I think we now have all the pieces to put this together for postgres ...

odscjames avatar Oct 06 '21 14:10 odscjames

Done for mysql!

https://github.com/odscjames/test-gh-actions-database-dump/tree/mysql

Working: https://github.com/odscjames/test-gh-actions-database-dump/actions/runs/1312442859

Broken: https://github.com/odscjames/test-gh-actions-database-dump/actions/runs/1312451636

odscjames avatar Oct 06 '21 15:10 odscjames

And I'm giving up, for now at least, on MS SQL Server. https://github.com/odscjames/test-gh-actions-database-dump/tree/mssqlserver

The problem is the official MS tool to actually dump schema (as opposed to just data) is a poorly maintained Python script. I got so far then hit a problem where all the workarounds were basically "run this on a system from 17 years ago" (maybe a touch exaggerated) but this is clearly going to make a very brittle and temperamental solution. Lets do the others then consider if this is worth it later.

(And other minor things, like that in any case to get this far I had it so every time you run it locally it has to download 100MB of code again and the official CLI tool complains with an error every time even tho the command actually works!)

odscjames avatar Oct 06 '21 16:10 odscjames

PR for MySQL: https://github.com/openreferral/specification/pull/261

odscjames avatar Oct 12 '21 14:10 odscjames

On Master branch there is now a Mysql schema for version 2, updated automatically.

I've made a new issue for adding content to the docs about this.

odscjames avatar Nov 04 '21 14:11 odscjames

Hey everyone, I wanted to reach out because I've recently been in touch with Greg Bloom and he pointed me in this direction. I'm a Dart/Flutter developer but I'm also a Co-Chair for the HL7 Health and Human Services working group. We're going to be working on trying to map FHIR to HSDS so I've been playing around with the spec, and my first step was just to try and create a postgresql script to create the tables. I wanted to post it here to see if I'm on the right track and if anyone sees any issues with what I've got so far. https://github.com/MayJuun/hsds/blob/main/sql/hsds.sql

Dokotela avatar Apr 23 '22 20:04 Dokotela

I am closing this as there are now some shell scripts run as part of the build process for generating mysql and postgresql database schemas.

@Dokotela hopefully these help with your use-case! Apologies on behalf of the community for not having a response to your comment earlier.

mrshll1001 avatar Nov 16 '23 14:11 mrshll1001