guerilla
guerilla copied to clipboard
Feature Request: Only run jobs on validated devices.
Only run jobs on devices that are known to be good. Validate devices with a known control job, and if not available take out of service. Have a dashboard of known problem devices.
Combining the above with extra devices would let us isolate clients from temporary problems.
For example, w= worker, d = device
w1:d1 tag = foo w2:d2 tag = foo
job 1 runs on tag foo.
We run a control job on w1:d1 and it fails. We remove w1:d1 from the pool of valid devices. Job 1 now will run only on w2:d2
This lets us achieve higher perceived reliability, and reduces the number of failing jobs.
It could be an option if we want to run a control job, how often, and what job.