argo-workflows
argo-workflows copied to clipboard
CI: Add GH Action for `/retest` comments to re-run failed jobs
Summary
Create a GH Action Workflow that reads comments by Members on PRs and detects /retest
. If detected, it should use the GH API to "re-run failed jobs".
Right now this permission is limited to Approver+ (those with "write" permissions), so the Action can perform this on behalf of Members and Reviewers. This will be particularly useful for test flakes.
This would be similar to upstream k8s's bot that reruns CI after detecting a /retest
comment
Use Cases
In particular, this is useful when the repo has a bout of flakey tests, such as:
- #12836, #12832, #10807, #7133, etc etc
Contributors, including me, have asked how to retry in those cases in the past:
- https://github.com/argoproj/argo-workflows/discussions/12840, Sept Contributor Meeting, etc etc
Pushing an empty commit (or closing/re-opening the PR) works, but re-runs all GH jobs, not just the failed one(s). /retest
to only re-run failed jobs would be faster and more efficient.
While we should fix test flakes -- especially as they sometimes are due to unhandled race conditions in the source code (not just test races) -- in the interim, while they are being diagnosed, root caused, and fixed, such a /retest
command is very useful.
Implementation Details
Similar to https://github.com/argoproj/argo-workflows/issues/12592#issuecomment-1962996204 for /cherry-pick
, we can run an action when a comment is made on a PR:
- We can check that the comment is from a Member of the org with
if: github.event.comment.author_association == 'MEMBER'
- See the
CommentAuthorAssociation
docs for more details. We could extend that to'CONTRIBUTOR'
if we wanted to as well.
- See the
- We can re-run failed jobs via the GH CLI:
gh run rerun RUN_ID --failed
- See the re-running jobs docs for the CLI
- There's a slight complication here that you have to get the latest
RUN_ID
from the PR number. I think there's several ways of doing that and seemingly no shortcut command?
We could also extract 2 into its own separate OSS action for other repos to use. I couldn't find one from some searching so I don't think it exists already?
Message from the maintainers:
Love this feature request? Give it a 👍. We prioritise the proposals with the most 👍.
This is fine. We use this for other projects as well
@agilgur5 I could have a look into this if it is up for grabs
Go for it