actions-runner-controller
actions-runner-controller copied to clipboard
Listener pod names conflict when using the same runnerScaleSetName in multiple orgs.
Checks
- [X] I've already read https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners-with-actions-runner-controller/troubleshooting-actions-runner-controller-errors and I'm sure my issue is not covered in the troubleshooting guide.
- [X] I am using charts that are officially provided
Controller Version
0.8.1
Deployment Method
Helm
Checks
- [X] This isn't a question or user support case (For Q&A and community support, go to Discussions).
- [X] I've read the Changelog before submitting this issue and I'm sure it's not due to any recently-introduced backward-incompatible changes
To Reproduce
-
gha-scale-set-controller
installed with helm chart defaults ingha-runner-system
namespace. -
gha-runner-scale-set
chart installed with fororg1
ingha-rss-org1
namespace with values.yaml:
runnerScaleSetName: dev-small-x64
githubConfigUrl: https://github.com/org1
-
gha-runner-scale-set
chart installed with fororg2
ingha-rss-org2
namespace with values.yaml:
runnerScaleSetName: dev-small-x64
githubConfigUrl: https://github.com/org2
Describe the bug
The controller creates AutoscalingListener/dev-small-x64-6d9cb658-listener
in the local gha-runner-system
namespace. This gets stuck in a error/crash loop. I believe due to a continuous conflict in which org its trying to register the listener.
{"severity":"info","ts":"2024-01-08T21:21:04Z","logger":"listener-app","message":"app initialized"}
{"severity":"info","ts":"2024-01-08T21:21:04Z","logger":"listener-app","message":"Starting listener"}
{"severity":"info","ts":"2024-01-08T21:21:04Z","logger":"listener-app","message":"refreshing token","githubConfigU
rl":"https://github.com/org1"}
{"severity":"info","ts":"2024-01-08T21:21:04Z","logger":"listener-app","message":"getting access token for GitHub
App auth","accessTokenURL":"https://api.github.com/app/installations/xxxxxxx/access_tokens"}
{"severity":"info","ts":"2024-01-08T21:21:04Z","logger":"listener-app","message":"getting runner registration toke
n","registrationTokenURL":"https://api.github.com/orgs/org1/actions/runners/registration-token"}
2024/01/08 21:21:04 Application returned an error: createSession failed: failed to create session: failed to get r
unner registration token on refresh: unexpected response from Actions service during registration token call: 403
- {"message":"Resource not accessible by integration","documentation_url":"https://docs.github.com/rest/actions/se
lf-hosted-runners#create-a-registration-token-for-an-organization"}
Describe the expected behavior
Listeners are "namespaced" by org and scale-set name.
This could be done by adding the org to the AutoscalingListener object name, or by placing the listener in the runner namespace instead of the local namespace.
Additional Context
N/A
Controller Logs
{"severity":"info","ts":"2024-01-08T21:25:33Z","logger":"AutoscalingListener","message":"Listener pod failed, deleting it and re-creating it","autoscalinglistener":{"name":"dev-small-x64-6d9cb658-listener","namespace":"gha-runner-system"},"namespace":"gha-runner-system","name":"dev-small-x64-6d9cb658-listener","reason":"","message":""}
{"severity":"info","ts":"2024-01-08T21:25:33Z","logger":"AutoscalingListener","message":"Creating a listener pod", "autoscalinglistener":{"name":"dev-small-x64-6d9cb658-listener","namespace":"gha-runner-system"}}
{"severity":"info","ts":"2024-01-08T21:25:33Z","logger":"AutoscalingListener","message":"Creating listener pod","autoscalinglistener":{"name":"dev-small-x64-6d9cb658-listener","namespace":"gha-runner-system"},"namespace":"gha-runner-system","name":"dev-small-x64-6d9cb658-listener"}
{"severity":"info","ts":"2024-01-08T21:25:33Z","logger":"AutoscalingListener","message":"Created listener pod","autoscalinglistener":{"name":"dev-small-x64-6d9cb658-listener","namespace":"gha-runner-system"},"namespace":"gha-runner-system","name":"dev-small-x64-6d9cb658-listener"}
On repeat...
Runner Pod Logs
N/A
Hello! Thank you for filing an issue.
The maintainers will triage your issue shortly.
In the meantime, please take a look at the troubleshooting guide for bug reports.
If this is a feature request, please review our contribution guidelines.
+1. This seems to be the case. Not a big issue for us, but still we are having to encode the org name into runner names to get around this
Hey @jgreat,
Is it possible that fnv hash of these two namespaces has collisions? The listener name is computed here. From the log, it seems to me like we are trying to create two resources with the same name. I don't know if this is happening, so your information could help tackle this problem :relaxed:
Hey @jgreat,
Is it possible that fnv hash of these two namespaces has collisions? The listener name is computed here. From the log, it seems to me like we are trying to create two resources with the same name. I don't know if this is happening, so your information could help tackle this problem ☺️
Is there a reason the listener name has to be so closely tied to runnerScaleSetName (to the point that we rely on a hash of the namespace to differentiate them)? I personally have a bunch of Runner sets named small
(in different namespaces) and determining which listener I want to look at (in the controller namespace) when troubleshooting Runner deployments isn't automatic anymore like it was with ARC. Having the listener name be configurable (even if you do default to the current behavior) would mitigate the issue described here and make things easier to administer.
Hey @jgreat,
Is it possible that fnv hash of these two namespaces has collisions? The listener name is computed here. From the log, it seems to me like we are trying to create two resources with the same name. I don't know if this is happening, so your information could help tackle this problem ☺️
Sorry for the late response on this.
Now that I have different names for the runners they do have different hashes in the hash section.
org1-dev-small-x64-6d9cb658-listener
org1-dev-large-x64-6d9cb658-listener
org2-dev-small-x64-744db644-listener
org2-dev-large-x64-744db644-listener
Seems like it should have been OK to run the same name for 2 orgs, but for some reason it wasn't happy.
Right, thanks for getting back to us!
I'm not sure what the problem was, but the hash is calculated per namespace. The listener name is derived from the name of the scale set, and the hash of the namespace. The idea was that namespaced name is unique per resource. I thought there was a collision, but it seems that the root cause of the issue wasn't the hash.
Let's close this issue for now, you seem to have resolved it, but please, submit another issue if you find any problem again :relaxed: