bad url join in celery task when site_url has a path components
Consider ckan.site_url = http://somehost/ckan in CKAN configuration file. With this, when building api_url here, one gets the wrong URL http://somehost/api/action because the "ckan" path component gets dropped by urljoin. One solution would be to have a trailing / in site_url configuration option but this is apparently not recommended. So I guess some url manipulation would be needed on extension side.
Note that other extensions (such as archiver and datastorer) have the same problem.
You're quite right.
How about changing it to:
api_url = urlparse.urljoin(context['site_url'] + '/', 'api/action')
Perhaps you could create some pull requests with this or similar change?
Yes I could. However, I'm not sure what the proper way to fix this. Sure your suggestion would work, but since this appears to be broken in other extensions as well, I was looking for a standard way to handle this... And looking into ckan source code, I could find a few different "solutions":
-
controllers/feed.py has
'/'.join([site_url, resource_path])and alsobase_url + h.url_for(controller='package', ...) -
controllers/package.py has
c.datastore_api = '%s/api/action' % \ config.get('ckan.site_url', '').rstrip('/')similar things in lib/dictization/model_dictize.py or model/package.py -
lib/cli.py has
fetch_url = config['ckan.site_url'] url = h.url_for(controller='package', action='read', id=dd['name']) url = urlparse.urljoin(fetch_url, url[1:]) + '.rdf'which, I guess, would also fail in case site_url has a path component.
Did I miss something or should we really go and fix all usages of urljoin the way you suggests?
site_url shouldn't have path components, that's what root_path is for. The url_for helper can be used to generate full paths correctly, I think.