ckanapi icon indicating copy to clipboard operation
ckanapi copied to clipboard

datastore_create can't upload a json records file

Open pauloneves opened this issue 1 year ago • 3 comments

datastore_create() method accepts a records parameter with a list of dictionaries. It will be converted to json by the api..

All types in the dictionary must be jsonable. A datatime value in the dict will issue a validation error.

To fix it I must convert my pandas dataframe to json string, have it loaded back to a python dictionary and then pass it as a parameter to the method that will convert it again to json.

It is very inefficient, specially for large datasets.

I'd like to be able to directly pass a json string to the datastore_create() or datastore_upsert() to have it sent to CKAN

pauloneves avatar Jun 26 '24 15:06 pauloneves

In general datastore_create and datastore_upsert are very slow ways of getting data into the datastore. Consider using a postgres COPY command like xloader and datapusher+ do for efficiently loading large datasets.

Or if you're interested in making it easier to connect pandas with ckanapi and the datastore API for loading data efficiently I would definitely entertain a pull request to make a fast path for loading datastore records.

wardi avatar Jun 26 '24 15:06 wardi

is "datapusher+" different from "datapusher"

I had a lot of problems with datapusher trying to guess my datatypes and I'm trying to do the work myself creating the datastore via api and uploading data to it so I can control how it is stored.

pauloneves avatar Jun 26 '24 20:06 pauloneves

yes, https://github.com/dathere/datapusher-plus analyzes all the data before setting types so that there's no errors on import

wardi avatar Jun 26 '24 20:06 wardi