ckanext-harvest
ckanext-harvest copied to clipboard
Option to clear only old history entries
Currently we have the clearsource_history
paster command for deleting old jobs but keeping the sources and the datasets. This is nice, but would be even better if we could use it to only delete the older jobs and keep more recent ones, e.g. something like
paster clearsource_history --older-than 30d
to remove all jobs older than 30 days.
Is this planned ?
I think this is more than nice-to-have. I had to clear history for a harvest source (db got bloated because of the many harvest objects). But by doing so, all harvest objects being gone, the harvester (CSW in my case) could no longer find its previous harvested objects, and re-harvested the whole source. And duplicated the data.
So I think clearsource_history should at least preserve the most recent/currently active harvest object.
Related PR: https://github.com/ckan/ckanext-harvest/pull/484
With the new option -k true
(with the CKAN click command) the latest harvest jobs with the current harvest objects will be preserved.