ckanext-spatial
ckanext-spatial copied to clipboard
Dataset Loses Harvest Object on WAF file Timestamp Change
related issue: https://github.com/GSA/data.gov/issues/4505
Summary:
When the timestamp of a WAF source file changes without any actual content modification, the metadata information disappears from the UI.
The root cause is the the harvest_object_id does not change with the new harvest_object_id. This was confirmed through the following API calls: /api/action/package_show?id=<package_id> /api/action/package_search?q=id:<package_id>
Additionally, testing on the most recent version of CKAN with only the ckanext-harvest and ckanext-spatial extensions replicated the problem.
Observations from Testing:
-
Manually run
ckan search-index rebuild <package_id>
resolved the issue, as the above API calls return correct value of harvest_object_id. -
Found the code block which should refresh the solr index: https://github.com/ckan/ckanext-spatial/blob/master/ckanext/spatial/harvesters/base.py#L709C1-L710C70
Testing with the following code changes yielded positive results: Invoking
package_update
instead ofpackage_index.index_package
resolved the issue. OR Addition ofmodel.Session.commit()
before invoking package_index.index_package also resolved the issue. OR callingrebuild
index instead of package_index.index_package does not solve the issue unlessmodel.Session.commit()
was called before invoking therebuild
.
It seems that the assumption that package_index.index_package
doesn't need a database commit to refresh Solr isn't valid based on the tests conducted above.
Any alternative solutions to address this issue?