ckanext-harvest icon indicating copy to clipboard operation
ckanext-harvest copied to clipboard

Problem with last_error_free_job() algorithm

Open knudmoeller opened this issue 6 years ago • 0 comments

In harvesters.HarvesterBase.last_error_free_job() I'm having trouble understanding the reason why obj.report_status != 'not modified' means that there was an error (see below). This would mean that obj.report_status == 'added' or obj.report_status == 'deleted' indicates an error. Shouldn't the test rather be obj.report_status == 'errored'?

In my harvester, where not modified means a dataset was skipped during import, this means that the last error free job is the one where all datasets were skipped.

    def last_error_free_job(cls, harvest_job):
        # TODO weed out cancelled jobs somehow.
        # look for jobs with no gather errors
        jobs = \
            model.Session.query(HarvestJob) \
                 .filter(HarvestJob.source == harvest_job.source) \
                 .filter(HarvestJob.gather_started != None) \
                 .filter(HarvestJob.status == 'Finished') \
                 .filter(HarvestJob.id != harvest_job.id) \
                 .filter(
                     ~exists().where(
                         HarvestGatherError.harvest_job_id == HarvestJob.id)) \
                 .order_by(HarvestJob.gather_started.desc())
        # now check them until we find one with no fetch/import errors
        # (looping rather than doing sql, in case there are lots of objects
        # and lots of jobs)
        for job in jobs:
            for obj in job.objects:
                if obj.current is False and \
                        obj.report_status != 'not modified':
                    # unsuccessful, so go onto the next job
                    break
            else:
                return job

knudmoeller avatar Dec 08 '17 16:12 knudmoeller