specification
specification copied to clipboard
Providing a reference datapackage aligned with the specification
Having previously raised #82, the situation has moved on somewhat 😀
I am working on some tooling ideas that use datapackage.json as input, reading it with the relevant ruby gems.
The version of datapackage.json in this repo is obviously the most uptodate, as this specification project has the most iterative updates. The problem is that using this as input fails trying to read the schema, because tableschema-rb looks for a CSV file based on the path value. The Data Location part of the spec suggests data must be present in some form.
Now there is sample-HSDS-datapackage that includes 2 examples, each with a datapackage.json file and a set of sample CSVs. However, this has not been updated with spec changes.
Options
- Update sample-HSDS-datapackage to match the spec.
- Add an
examplefolder to this spec and provide a canonical set of template CSVs in it, that mean the specification datapackage.json can be used directly. Or, if the paths should be maintained in a folder at the same level as the datapackage.json, could datapackage.json be moved up into areferencefolder (or some other name) that also includes CSVs? - Other ideas?
The 2nd option seems like it would be more likely to result in the template CSV files being maintained, and thus the datapackage.json that defines the spec being directly usable for tooling. But it has other implications.
Option 2 sounds ideal to me.
I think most people experience data as spreadsheets and are comfortable with "templates".
Has anything prevented us from simple putting together a Google Sheets file with all the CSVs and incorporated that into the docs?
Could we close this issue by making such a Google Sheet, downloading each table as a CSV and placing it in the repo?
@devinbalkind dont the path values need to be routes to text CSV files? I guess I'm not sure if a sheet would work, but it feels like it could introduce format and parsing complexities that are not an issue for text files.
Unless you mean having a sample sheet as well as simple CSV files that are referenced by the datapacakge.json file?
I think I wasn't clear.
My proposal is:
- Make a reference HSDS Google Sheet. This would make it (a) easy for people to view the template, (b) copy it, (c) download the whole thing as XLS, (d) download each table as CSV
- We download each sheet as a CSV and put them in the repo at the correct path/address to fit the HSDS schema.
Does that make sense?
OK that sounds good, the 1st step would be a very useful reference to keep uptodate and the 2nd would help solve this issue.
Obviously, doing this by hand makes sense for now, but I don't think it'd be hard to automate. I doubt I'll get a chance to do it any time soon, but if others agree, I'll add an issue and see if we can pick it up at some point.
Closing as the HSDS Schema tools now generates a reference datapackage.json, for not only 3.0 but for profiles as well.