datapusher
datapusher copied to clipboard
Add option to rewrite resource url in cases when ckan.datapusher.callback_url_base is used in CKAN
This complements and fixes an issue with datapusher when ckan.datapusher.callback_url_base is set to True
in CKAN. In scenarios when running on the cloud, in secure environments or using docker-compose it's useful for datapusher to use an internal CKAN url like for instance http://ckan:5000
.
This PR adds options to enable rewriting of URL's of uploaded resources in CKAN so that datapusher can reach them. The two options are:
- REWRITE_RESOURCES - boolean whether or not to enable rewriting of resources URL's, default:
False
- REWRITE_URL - URL to use for rewriting, defaults to
http://ckan:5000/
@amercader if you're OK with this feature, let's wait on merging until #207 is merged so that I can update the config options in the readme.
@mbocevski
I think this needs to be fixed in CKAN core. Here we should take ckan.datapusher.callback_url_base
into account instead of using site_url. metadata['ckan_url']
is what DataPusher uses to construct the CKAN URLs. When ckan.datapusher.callback_url_base
was introduced it only handled the result_url
which is the URL that CKAN Service provider pings to report progress.
It makes sense to configure this only at the CKAN level and not at the DataPusher level as this can be specific to each CKAN instance.
@amercader Yeah that makes sense, I will try it out and submit a PR in core. However, it's not immediately clear whether that's just for constructing the URL for the resource or for the actual resource file. Cause note that we only want to rewrite resources that are uploaded to CKAN filestore and not rewrite resources that are links.
@amercader Yeah, just as I suspected, so site_url in core gets set to the callback URL which is correct. The logic is that once a job is created, datapusher then calls resource_show to get the URL of the resource, download it and insert it in the database.
They way things are, I see at least two options:
- Take the approach that I'm suggesting in this PR
- Change datapusher extension in Core to send the URL of the resource with the request, and implement the logic if uploaded to filestore use the callback_url instead of site_url, remove resource_show from datapusher. I'm not sure why this is not part of the current design, so there might be a good reason for that.
WDYT?