OpenCue include_finished doesn't include data from history tables

JobSearchCriteria includes an include_finished option for fetching finished jobs. However some jobs may still not be returned, as the API backend only reads from the main job table, and old jobs are auto-archived to the job_history table in the database. This may cause unexpected results, particularly if a job suddenly disappears from the API response.

Fixing this might not be trivial - job_history can be a huge table, which could easily overflow the max gRPC message size or cause various other problems. We might need to extend the API to support paging.

Apr 25 '19 01:04 bcipriano

Is there currently ANY way of accessing the data from archived jobs? Do I understand correctly that the maximum age of a job is defined here: public static final int ARCHIVE_JOBS_CUTOFF_HOURS = 72 ; https://github.com/AcademySoftwareFoundation/OpenCue/blob/e6879f34693c66a1455a14df36ed707d77bfffab/cuebot/src/main/java/com/imageworks/spcue/service/HistoricalManagerService.java#L36

Perhaps this could be defined by a constant in the config?

Sam

Apr 24 '20 14:04 samkenw

@samkenw

The only way to access the archived jobs, currently, is to query the database directly, in the job_history table.

I think the primary concern here, and the thing preventing us from quickly implementing this, is the potentially huge number of jobs this could be returned via the API. So we may need to consider implementing paging or limit it in some other way.

Re: adjusting the archive cutoff, I've filed this as #776 and am working on it currently. It should be relatively simple to implement that as you suggested.

Sep 03 '20 20:09 bcipriano