Datapackage and frictionless-py
Hello,
I wanted to point out that currently the datapackages one gets from the "Download Datapackage" are not easily parsed out of the box. I don't know if using something like frictionless to read them is intended but if that is the case I think some changes have to be made to the metadata in order for it to work out of the box.
I use the following file as an example: wind_turbine_domestic_lod_geoss_tp_oeo
- Make all the "name" entries in the resources section lower case, the json schema has this as a constraint and I could not make it work without changing the files.
- Remove foreign keys if it is not being used. I guess one could make it null but I think this is safer.
- Don't use empty lists in primarykey, I think the schema expects either a string or a null.
- This one is tricky: Make the "path" point to the csv file itself. I think one could have the website directly if the link pointed to a csv file but it does not. I say is tricky becasue I don't know how different applications handle relative paths.
- Change the format from SQL to csv, since the dapackage is metadata refering to the contents of the datapackage itself, since it is a csv file I think it should point to this format.
- On the same line with the last point, remove the dialect from the package, it raises a very cryptic error at least in frictionless (I am raising this issue there soon, it was super painful to debug but I don't think it is a fault of the datapackage but the library itself).
- Replace "bigint" with "int", the schema does not recognize bigint as a format.
Here is a gist with the modified package example: gist
If you load it from a python script using frictionless in a folder datapackage containing both files it will recognise.
I made a repository to reproduce the parsing with the modified metadata here: https://github.com/areleu/frictionless_oep_example
Yea, good point. Currently, this was just a basic way to get data + metadata, but yes, eventually it should be fully compatible with frictionless.
I assign myself for now, but it might take a while, because there are other things with higher priority.