Adding tmate action to CI for debugging
Description
This adds tmate to our CI in a mode that will only generate a connection if the CI action has failed. The connection will remain open for 15 minutes to allow someone with access to the action output to connect to the CI machine (once connected, the action will remain live until the user exits the terminal session).
Alternative options for deploying tmate are discussed in the issue below.
Fixes #3137
This could be helpful to investigate the PR #3087
Okay I think I found a solution I'm happy enough with here, even if it's a little less elegant than I'd like. Contributors can now submit a commit with a message that contains [gha-debug] to their PR to produce a detached tmate session that allows all steps of the workflow to execute and then provides an SSH command to allow one to login and perform whatever problem solving is needed (default timeout is 10 min).
Github actions surprisingly doesn't make it very easy to get the latest commit information for all events that trigger workflows as part of the github.event context in the workflow file. The pull_request event requires a checkout of the repo with more depth so the merge commit parents can be extracted to get the correct message for determining whether or not the tmate session should be enabled.
Just smoothed out one final snag here: The tmate action was causing jobs to fail when timing out, but I wasn't a fan of that. The continue-on-error flag for that step did the trick, so CI runs that trigger the tmate debug session with [gha-debug] in their commit messages should pass checks if the tests pass. This should keep PRs from requiring another CI session if the tests pass with the tmate session enabled.
Here's an example of this working in my fork: https://github.com/pshriwise/openmc/actions/runs/11335356157
I'll note that the errors do still appear in the summary, but aren't really of consequence.
As is, this will require a PR author to be aware of the "[gha-debug]" special string and make a commit with that. I personally don't see how this is much easier than letting people be aware of tmate in general and having them add the few lines in ci.yml since either way they will have to make an extra commit. I would prefer the latter for simplicity.
Hrmm that's a good point about awareness. It's something of an easter egg at the moment. I think there are ways to address that though by adding notes in the documentation testing section and to the pull request template.
I'd love for more people to be aware of the tmate action for sure since it's so useful, and, yes, it's true that a commit is required either way, but what you've outlined above requires a change to ci.yml that will have to be removed later before the PR is merged (if the author and reviewer remember). It just feels more noisy to me. I think will also save contributors time determining:
- where the
ci.ymlfile is, - where in the
ci.ymlfile they should place thetmatelines, - and what options for the
tmateaction they should use
as well as maintainers' time providing that explanation if needed instead of "run $ git commit -m "[gha-debug]" --allow-empty and watch CI for a login line".
At the end of the day, It's up to you bit I wouldn't mind saving some cycles for CI debugging when it's needed.
I would agree with @pshriwise , as a user/dev, I feel that adding an empty commit, is simpler than tweaking the ci.yml (one way to get access to the runner, and back to remove it at the end)
Ok, you've won me over on this. Before I approve/merge though, can you add something brief in the documentation (probably devguide/tests.rst) that explains how to use this?
🙏🏻
Gladly!