ckanext-datapackager
ckanext-datapackager copied to clipboard
Support import/export to DataStore
See https://github.com/okfn/data.okfn.org/issues/126 for some discussion. We can close this issue and reopen the original issue if it doesn't make sense to implement this within this extension.
@amercader any idea of what will be required here?
I think it's important to clarify the whole "data in DataStore" aspect and how it relates in particular to this extension.
Import to DataStore
This extension creates normal CKAN datasets and resources. There are already well established ways to get the contents of tabular files into the DataStore, namely the DataPusher and other alternatives like @davidread's Express Loader. If you have any of these enabled, after importing a data package that links to or contains tabular files, data will get imported to the DataStore.
What is missing is using the table schema if there is one defined to inform this import (ie create the appropriate field types, etc). This piece of work will really add value and will enable much better integration of other tools like ckanext-validation or the upcoming Schema Creator. The spec is well defined in https://github.com/frictionlessdata/ckanext-validation/issues/7. I have now reviewed and added more detail in the issue.
The estimate is now 7 days
There are more obscure cases that could be covered like data packages with inline data but I don't think they should be tackled in the MVP.
Export from DataStore
I'm not really sure what's missing here. If the data package linked to or included files, these will be included in the export (right now as links, if #52 gets implemented as physical files in a ZIP file).
The only bit that is missing is exporting a schema
if data is on the DataStore and there is no existing schema
in the resource (eg for datasets not coming from a data package import).
This should be done in the package_show_as_datapackage
function. We need to iterate over all resources to see if they are in the DataStore (querying datastore_search
for each, or a single query to _table_metadata
with all resource ids) and they not have a schema
. If so, translate the fields
definition to Table Schema and add it to the output dict (or maybe datastore_search
could return the schema
directly)
Estimate for this is 2 days