actions-runner-controller icon indicating copy to clipboard operation
actions-runner-controller copied to clipboard

Listener pod names conflict when using the same runnerScaleSetName in multiple orgs.

Open jgreat opened this issue 1 year ago • 3 comments

Checks

  • [X] I've already read https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners-with-actions-runner-controller/troubleshooting-actions-runner-controller-errors and I'm sure my issue is not covered in the troubleshooting guide.
  • [X] I am using charts that are officially provided

Controller Version

0.8.1

Deployment Method

Helm

Checks

  • [X] This isn't a question or user support case (For Q&A and community support, go to Discussions).
  • [X] I've read the Changelog before submitting this issue and I'm sure it's not due to any recently-introduced backward-incompatible changes

To Reproduce

  1. gha-scale-set-controller installed with helm chart defaults in gha-runner-system namespace.

  2. gha-runner-scale-set chart installed with for org1 in gha-rss-org1 namespace with values.yaml:

runnerScaleSetName: dev-small-x64
githubConfigUrl: https://github.com/org1
  1. gha-runner-scale-set chart installed with for org2 in gha-rss-org2 namespace with values.yaml:
runnerScaleSetName: dev-small-x64
githubConfigUrl: https://github.com/org2

Describe the bug

The controller creates AutoscalingListener/dev-small-x64-6d9cb658-listener in the local gha-runner-system namespace. This gets stuck in a error/crash loop. I believe due to a continuous conflict in which org its trying to register the listener.

{"severity":"info","ts":"2024-01-08T21:21:04Z","logger":"listener-app","message":"app initialized"}               
{"severity":"info","ts":"2024-01-08T21:21:04Z","logger":"listener-app","message":"Starting listener"}             
{"severity":"info","ts":"2024-01-08T21:21:04Z","logger":"listener-app","message":"refreshing token","githubConfigU
rl":"https://github.com/org1"}                                                                    
{"severity":"info","ts":"2024-01-08T21:21:04Z","logger":"listener-app","message":"getting access token for GitHub 
App auth","accessTokenURL":"https://api.github.com/app/installations/xxxxxxx/access_tokens"}                     
{"severity":"info","ts":"2024-01-08T21:21:04Z","logger":"listener-app","message":"getting runner registration toke
n","registrationTokenURL":"https://api.github.com/orgs/org1/actions/runners/registration-token"}  
2024/01/08 21:21:04 Application returned an error: createSession failed: failed to create session: failed to get r
unner registration token on refresh: unexpected response from Actions service during registration token call: 403 
- {"message":"Resource not accessible by integration","documentation_url":"https://docs.github.com/rest/actions/se
lf-hosted-runners#create-a-registration-token-for-an-organization"} 

Describe the expected behavior

Listeners are "namespaced" by org and scale-set name.

This could be done by adding the org to the AutoscalingListener object name, or by placing the listener in the runner namespace instead of the local namespace.

Additional Context

N/A

Controller Logs

{"severity":"info","ts":"2024-01-08T21:25:33Z","logger":"AutoscalingListener","message":"Listener pod failed, deleting it and re-creating it","autoscalinglistener":{"name":"dev-small-x64-6d9cb658-listener","namespace":"gha-runner-system"},"namespace":"gha-runner-system","name":"dev-small-x64-6d9cb658-listener","reason":"","message":""}     
{"severity":"info","ts":"2024-01-08T21:25:33Z","logger":"AutoscalingListener","message":"Creating a listener pod", "autoscalinglistener":{"name":"dev-small-x64-6d9cb658-listener","namespace":"gha-runner-system"}}                 
{"severity":"info","ts":"2024-01-08T21:25:33Z","logger":"AutoscalingListener","message":"Creating listener pod","autoscalinglistener":{"name":"dev-small-x64-6d9cb658-listener","namespace":"gha-runner-system"},"namespace":"gha-runner-system","name":"dev-small-x64-6d9cb658-listener"}                                                            
{"severity":"info","ts":"2024-01-08T21:25:33Z","logger":"AutoscalingListener","message":"Created listener pod","autoscalinglistener":{"name":"dev-small-x64-6d9cb658-listener","namespace":"gha-runner-system"},"namespace":"gha-runner-system","name":"dev-small-x64-6d9cb658-listener"}

On repeat...

Runner Pod Logs

N/A

jgreat avatar Jan 08 '24 21:01 jgreat

Hello! Thank you for filing an issue.

The maintainers will triage your issue shortly.

In the meantime, please take a look at the troubleshooting guide for bug reports.

If this is a feature request, please review our contribution guidelines.

github-actions[bot] avatar Jan 08 '24 21:01 github-actions[bot]

+1. This seems to be the case. Not a big issue for us, but still we are having to encode the org name into runner names to get around this

ananthu1834 avatar Jan 19 '24 12:01 ananthu1834

Hey @jgreat,

Is it possible that fnv hash of these two namespaces has collisions? The listener name is computed here. From the log, it seems to me like we are trying to create two resources with the same name. I don't know if this is happening, so your information could help tackle this problem :relaxed:

nikola-jokic avatar Feb 01 '24 13:02 nikola-jokic

Hey @jgreat,

Is it possible that fnv hash of these two namespaces has collisions? The listener name is computed here. From the log, it seems to me like we are trying to create two resources with the same name. I don't know if this is happening, so your information could help tackle this problem ☺️

Is there a reason the listener name has to be so closely tied to runnerScaleSetName (to the point that we rely on a hash of the namespace to differentiate them)? I personally have a bunch of Runner sets named small (in different namespaces) and determining which listener I want to look at (in the controller namespace) when troubleshooting Runner deployments isn't automatic anymore like it was with ARC. Having the listener name be configurable (even if you do default to the current behavior) would mitigate the issue described here and make things easier to administer.

grzleadams avatar Feb 27 '24 15:02 grzleadams

Hey @jgreat,

Is it possible that fnv hash of these two namespaces has collisions? The listener name is computed here. From the log, it seems to me like we are trying to create two resources with the same name. I don't know if this is happening, so your information could help tackle this problem ☺️

Sorry for the late response on this.

Now that I have different names for the runners they do have different hashes in the hash section.

org1-dev-small-x64-6d9cb658-listener
org1-dev-large-x64-6d9cb658-listener
org2-dev-small-x64-744db644-listener
org2-dev-large-x64-744db644-listener

Seems like it should have been OK to run the same name for 2 orgs, but for some reason it wasn't happy.

jgreat avatar Feb 27 '24 15:02 jgreat

Right, thanks for getting back to us!

I'm not sure what the problem was, but the hash is calculated per namespace. The listener name is derived from the name of the scale set, and the hash of the namespace. The idea was that namespaced name is unique per resource. I thought there was a collision, but it seems that the root cause of the issue wasn't the hash.

Let's close this issue for now, you seem to have resolved it, but please, submit another issue if you find any problem again :relaxed:

nikola-jokic avatar Mar 04 '24 10:03 nikola-jokic