bazel-buildfarm icon indicating copy to clipboard operation
bazel-buildfarm copied to clipboard

DispatchedMonitor SLA

Open werkt opened this issue 3 years ago • 0 comments

The DispatchedMonitor has been observed to get stuck on schedulers without meeting basic needs associated with requeueing actions.

While the hangs themselves require fixes, a hang should be deadlined by a reasonable SLA for the basic iteration. The SLA should be configurable, and in the process more of the magic numbers associated with DispatchedMonitor should be configurized, and the reaction should be to error loudly and self-terminate the scheduler. The loud error should also include the state of the ongoing execution service/threads.

werkt avatar Jul 28 '20 15:07 werkt