code-review
code-review copied to clipboard
retry code review bot's own tasks if they hit an exception
E.g. https://firefox-ci-tc.services.mozilla.com/tasks/CgHOBJ-oSVSHtrTl5-4XEw/runs/0/logs/public/logs/live.log is a task which an exception (e.g. becomes unresponsive) and the machine gets terminated without uploading the logs. There should be at least one attempt to retry the task, e.g. by setting it to auto retry.
and if a task fails, it should return the failure to phab :)
This would be good to have because it increases resiliency of the bot - a low probability issue will turn very unlikely.
On GCP, preemptibles VMs get a signal, and it seems Taskcluster already supports it thanks to Jesse.
I do not think Taskcluster propagate that signal to the tasks themselves, but just set them as Exception, and retries them if retries are left.
This mean we cannot run a cleanup action, but could re-run through retry
IIRC there's also a way to force reruns in some cases depending on the return code of the cmd of the task. CC @bhearsum
If you're using run-task, retry-exit-status is available: https://github.com/taskcluster/taskgraph/blob/9b0f5fc2c59994c393bd5e7e87bf4462e9cb5adf/src/taskgraph/transforms/task.py#L548-L549
We are not using run-task, but a hook running a docker image directly
I'm not aware of any built-in way to do this, in that case.