datapusher icon indicating copy to clipboard operation
datapusher copied to clipboard

Automatically import from file upload/update

Open cphsolutionslab opened this issue 11 years ago • 8 comments

I'm testing out CKAN 2.2 with the DataPusher, instead of the old DataStorer.

The old DataStorer ran a cronjob every X hour to check for updates to the DataStore.

It seems as if the DataPusher does not do this. Are there a way for having the DataPusher check resources every X something for any updates?

cphsolutionslab avatar Jun 18 '14 09:06 cphsolutionslab

Basically the problem, for me, is when I use the new FileStore API and update a file, the DataStore does not get updated. It seems as the DataPusher is only being called upon file creation and URL change.

I also have several WFS-services as resources and when they change, the DataPusher does not update the DataStore.

cphsolutionslab avatar Jun 19 '14 07:06 cphsolutionslab

Can we get an update on this?

I've written a dirty python script for this, which will iterate over the resources i would like to update, changing the url slightly before changing it back (Appending a '&' works in most cases), effectively starting the datapusher. It would be lovely to have built-in support for this instead.

NicolaiLolansen avatar Jul 11 '16 11:07 NicolaiLolansen

Maybe the paster datapusher can help with this?

paster datapusher
Perform commands in the datapusher

    Usage:

        resubmit  - Resubmit all datastore resources to the datapusher,
                    ignoring if their files haven't changed.
        submit <pkgname> - Submits all resources from the package
                         identified by pkgname (either the short name or ID).
        submit_all  - Submit every package to the datastore.
                      This is useful if you're setting up datastore
                      for a ckan that already has datasets.

That or calling datapusher_submit with the resource id on the script you are using to reupload the file.

amercader avatar Jul 14 '16 11:07 amercader

Okay nice one! Is this documented somewhere? I had a hard time locating this information.

NicolaiLolansen avatar Jul 14 '16 11:07 NicolaiLolansen

@NicolaiMogensen I doubt it, it would be great if you could add it here: http://docs.ckan.org/en/latest/maintaining/datastore.html#datapusher-automatically-add-data-to-the-datastore and send a PR

amercader avatar Jul 14 '16 11:07 amercader

I'll look into it, once i've dived into the code, the code on github is not the same as the datapusher code thats packaged with CKAN if you do a package install correct? What's the best way to go about that, documentation wise?

NicolaiLolansen avatar Jul 14 '16 12:07 NicolaiLolansen

I dont have submit_all is that on a new version? I'm running 2.5.2

This is what i have:


Perform commands in the datapusher

    Usage:

        resubmit  - Resubmit all datastore resources to the datapusher,
                    ignoring if their files haven't changed.
        submit <pkgname> - Submits all resources from the package
                         identified by pkgname (either the short name or ID).

NicolaiLolansen avatar Jul 14 '16 12:07 NicolaiLolansen

The code for these docs is on the main ckan repo: https://github.com/ckan/ckan/blob/master/doc/maintaining/datastore.rst

The DataPusher shipped on the package install is this same repo (it might be an older version).

The submit_all command is on current CKAN master, so it will be available on 2.6. Or you can pick the changes as they are quite trivial: https://github.com/ckan/ckan/pull/3024

amercader avatar Jul 14 '16 14:07 amercader