ckanext-harvest
ckanext-harvest copied to clipboard
Problem with last_error_free_job() algorithm
In harvesters.HarvesterBase.last_error_free_job()
I'm having trouble understanding the reason why obj.report_status != 'not modified'
means that there was an error (see below). This would mean that obj.report_status == 'added'
or obj.report_status == 'deleted'
indicates an error. Shouldn't the test rather be obj.report_status == 'errored'
?
In my harvester, where not modified
means a dataset was skipped during import, this means that the last error free job is the one where all datasets were skipped.
def last_error_free_job(cls, harvest_job):
# TODO weed out cancelled jobs somehow.
# look for jobs with no gather errors
jobs = \
model.Session.query(HarvestJob) \
.filter(HarvestJob.source == harvest_job.source) \
.filter(HarvestJob.gather_started != None) \
.filter(HarvestJob.status == 'Finished') \
.filter(HarvestJob.id != harvest_job.id) \
.filter(
~exists().where(
HarvestGatherError.harvest_job_id == HarvestJob.id)) \
.order_by(HarvestJob.gather_started.desc())
# now check them until we find one with no fetch/import errors
# (looping rather than doing sql, in case there are lots of objects
# and lots of jobs)
for job in jobs:
for obj in job.objects:
if obj.current is False and \
obj.report_status != 'not modified':
# unsuccessful, so go onto the next job
break
else:
return job