Add check for stuck jobs in poll()
Fixes #5352
Description of the Change Add a way to check if jobs were stuck by adding an addition poll that occurs every hour, it checks if an active_tasks fraction_done doesn't change and current_cpu_time < 10s.
Alternate Designs
Release Notes
This might be an issue for really big tasks (e.g. ClimatePrediction.net)
Would this be more of an issue of an hour is too short of time frame, or more with the implementation?
Codecov Report
Merging #5451 (eb07ea0) into master (c02d6e0) will not change coverage. Report is 16 commits behind head on master. The diff coverage is
n/a.
Additional details and impacted files
@@ Coverage Diff @@
## master #5451 +/- ##
=========================================
Coverage 10.84% 10.84%
Complexity 1068 1068
=========================================
Files 279 279
Lines 36156 36156
Branches 8355 8355
=========================================
Hits 3920 3920
Misses 31842 31842
Partials 394 394
@FTang21, ah, no, my bad: in the original proposal there was an additional verification method for long running jobs: CPU time, and I completely missed that this was the part of the implementation in this PR.
@davidpanderson, could you please review this and verify that this is a desired implementation of the original proposal?
There are various problems with this. I updated the issue to clarify what needs to be done: https://github.com/BOINC/boinc/issues/5352
@davidpanderson I updated the implementation based on the updated issue. Lmk if this is ok. Should I add the abort on its own after some time or this is fine for now? Would this also be preferable as it function?
For now, let's just show a message using msg_printf(atp->project, MSG_USER_ALERT...)
... telling the user which job is stuck, and that they should consider aborting it.
This will be useful for testing because we can see the stuck job and decide if it's really stuck.
Gotcha, I updated it to MSG_USER_ALERT
Almost but not quite. Please review my pseudo-code.
Ah I see what I missed, it should match the order provided.