ckanapi
ckanapi copied to clipboard
dump datasets performance: use package_search for ckan >= 2.2
with package_search we can dump all datasets in far fewer API calls.
issues:
- ckan < 2.2 returns different dataset data from package_search and package_show so we'll need to maintain the old code as well
- we need to request the datasets ordered by id, not modification date, so that we know we have a complete dump and to replicate the current behaviour
- ckan sites may have limited the number of packages returned from package_search in different ways, maybe detect the limit and work with what we're given, or just revert to package_show method?
@amercader we discussed this at a dev meeting last week. I now think it's impossible to reliably get all the datasets from package_search with the default sort="metadata_modified desc". Any datasets modified while we're dumping will be missed and duplicates will appear as we're paging through because the end of one page will get pushed into the next.
Setting sort="id asc" should work though.