ckanext-datapackager icon indicating copy to clipboard operation
ckanext-datapackager copied to clipboard

Support import/export to DataStore

Open danfowler opened this issue 8 years ago • 2 comments

See https://github.com/okfn/data.okfn.org/issues/126 for some discussion. We can close this issue and reopen the original issue if it doesn't make sense to implement this within this extension.

danfowler avatar Mar 28 '16 12:03 danfowler

@amercader any idea of what will be required here?

pwalsh avatar May 30 '16 11:05 pwalsh

I think it's important to clarify the whole "data in DataStore" aspect and how it relates in particular to this extension.

Import to DataStore

This extension creates normal CKAN datasets and resources. There are already well established ways to get the contents of tabular files into the DataStore, namely the DataPusher and other alternatives like @davidread's Express Loader. If you have any of these enabled, after importing a data package that links to or contains tabular files, data will get imported to the DataStore.

What is missing is using the table schema if there is one defined to inform this import (ie create the appropriate field types, etc). This piece of work will really add value and will enable much better integration of other tools like ckanext-validation or the upcoming Schema Creator. The spec is well defined in https://github.com/frictionlessdata/ckanext-validation/issues/7. I have now reviewed and added more detail in the issue.

The estimate is now 7 days

There are more obscure cases that could be covered like data packages with inline data but I don't think they should be tackled in the MVP.

Export from DataStore

I'm not really sure what's missing here. If the data package linked to or included files, these will be included in the export (right now as links, if #52 gets implemented as physical files in a ZIP file).

The only bit that is missing is exporting a schema if data is on the DataStore and there is no existing schema in the resource (eg for datasets not coming from a data package import).

This should be done in the package_show_as_datapackage function. We need to iterate over all resources to see if they are in the DataStore (querying datastore_search for each, or a single query to _table_metadata with all resource ids) and they not have a schema. If so, translate the fields definition to Table Schema and add it to the output dict (or maybe datastore_search could return the schema directly)

Estimate for this is 2 days

amercader avatar Feb 09 '18 11:02 amercader