ckanext-harvest icon indicating copy to clipboard operation
ckanext-harvest copied to clipboard

Remove old entries in harvest_job_table

Open frafra opened this issue 3 years ago • 2 comments

clean-harvest-log removes old logs, but I have not found anything similar to clean harvest_job_table. Having automatic harvesting means ending up with a forever-growing, non paginated/non-searchable job page, which seems pretty useless to me.

What about keeping the last 100 jobs by default and/or having a clean-up procedure? Like clean-harvest-jobs.

frafra avatar Nov 17 '21 11:11 frafra

I think there is already a command clear-history https://github.com/ckan/ckanext-harvest/blob/d84d847b09f28ab97bf1ca0baa651fdc05693d03/ckanext/harvest/cli.py#L112 or rather clearsource_history https://github.com/ckan/ckanext-harvest/blob/d84d847b09f28ab97bf1ca0baa651fdc05693d03/ckanext/harvest/commands/harvester.py#L220 for deleting the old harvest jobs and the related objects. But at the moment the command still deletes all old and current harvest jobs and harvest objects. But we already working on a pull request with an updated version which keeps at least the running harvest jobs and latest harvest objects.

seitenbau-govdata avatar Dec 20 '21 13:12 seitenbau-govdata

The pull request is now available. https://github.com/ckan/ckanext-harvest/pull/484

seitenbau-govdata avatar Dec 21 '21 14:12 seitenbau-govdata