actions-runner-controller
actions-runner-controller copied to clipboard
Controller is "leaking" offline runners (i.e. is not always calling Github's API to remove them)
Controller Version
0.25.2
Helm Chart Version
0.20.2
CertManager Version
1.9.0
Deployment Method
Helm
cert-manager installation
- yes
- yes
Checks
- [X] This isn't a question or user support case (For Q&A and community support, go to Discussions. It might also be a good idea to contract with any of contributors and maintainers if your business is so critical and therefore you need priority support
- [X] I've read releasenotes before submitting this issue and I'm sure it's not due to any recently-introduced backward-incompatible changes
- [X] My actions-runner-controller version (v0.x.y) does support the feature
- [X] I've already upgraded ARC (including the CRDs, see charts/actions-runner-controller/docs/UPGRADING.md for details) to the latest and it didn't fix the issue
Resource Definitions
apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerDeployment
metadata:
creationTimestamp: "2022-08-05T08:48:21Z"
generation: 5145
name: autochecks
namespace: github-runner
resourceVersion: "22349026"
uid: d5660e91-55bf-44a1-9a9d-de7ea59dcde3
spec:
effectiveTime: "2022-08-16T17:05:44Z"
replicas: 5
selector: null
template:
metadata: {}
spec:
dockerEnabled: true
dockerVolumeMounts:
- mountPath: /home/runner/_work/
name: shared
readOnly: false
dockerdContainerResources:
limits:
cpu: 1
ephemeral-storage: 40Gi
memory: 4Gi
requests:
cpu: 500m
ephemeral-storage: 4Gi
memory: 1Gi
env:
- name: RUNNER_ALLOW_RUNASROOT
value: "1"
image: 123123123123.dkr.ecr.us-east-1.amazonaws.com/github-runner:node-12
imagePullSecrets:
- name: kubernetes
initContainers:
- command:
- sh
- -c
- cp /tmp/dockercfg/* /home/runner/.docker/
image: public.ecr.aws/docker/library/busybox:stable-musl
name: copy-dockerconfig
volumeMounts:
- mountPath: /home/runner/.docker/
name: dockercfg
- mountPath: /tmp/dockercfg
name: dockercfg-secret
labels:
- automated-checks
- automated-checks-ephemeral
nodeSelector:
eks.amazonaws.com/nodegroup: github-runners-tools-main
organization: MyOrg
resources:
limits:
cpu: 1
ephemeral-storage: 40Gi
memory: 1Gi
requests:
cpu: 1
ephemeral-storage: 4Gi
memory: 1Gi
securityContext:
fsGroup: 1000
serviceAccountName: github-runner
tolerations:
- effect: NoExecute
key: dedicated
operator: Equal
value: github-runners
volumeMounts:
- mountPath: /home/runner/.docker/
name: dockercfg
- mountPath: /home/runner/_work/
name: shared
readOnly: false
volumes:
- emptyDir: {}
name: dockercfg
- emptyDir: {}
name: shared
- name: dockercfg-secret
secret:
items:
- key: .dockerconfigjson
path: config.json
secretName: dockercfg
status:
availableReplicas: 5
desiredReplicas: 5
readyReplicas: 5
replicas: 5
updatedReplicas: 5
---
apiVersion: actions.summerwind.dev/v1alpha1
kind: HorizontalRunnerAutoscaler
metadata:
creationTimestamp: "2022-08-05T08:48:21Z"
generation: 7479
name: autochecks
namespace: github-runner
resourceVersion: "22350550"
uid: 02cf7e48-e161-4e4b-8630-40a28e559ccb
spec:
capacityReservations:
- effectiveTime: "2022-08-16T17:06:41Z"
expirationTime: "2022-08-16T17:11:41Z"
replicas: 1
- effectiveTime: "2022-08-16T17:06:46Z"
expirationTime: "2022-08-16T17:11:46Z"
replicas: 1
- effectiveTime: "2022-08-16T17:06:47Z"
expirationTime: "2022-08-16T17:11:47Z"
replicas: 1
- effectiveTime: "2022-08-16T17:06:47Z"
expirationTime: "2022-08-16T17:11:47Z"
replicas: 1
- effectiveTime: "2022-08-16T17:06:49Z"
expirationTime: "2022-08-16T17:11:49Z"
replicas: 1
maxReplicas: 20
minReplicas: 1
scaleDownDelaySecondsAfterScaleOut: 30
scaleTargetRef:
name: autochecks
scaleUpTriggers:
- duration: 5m0s
githubEvent:
workflowJob: {}
scheduledOverrides:
- endTime: "2022-05-16T05:00:00Z"
minReplicas: 0
recurrenceRule:
frequency: Weekly
startTime: "2022-05-13T22:00:00Z"
- endTime: "2022-05-17T05:00:00Z"
minReplicas: 0
recurrenceRule:
frequency: Daily
startTime: "2022-05-16T22:00:00Z"
status:
desiredReplicas: 6
lastSuccessfulScaleOutTime: "2022-08-16T17:06:50Z"
scheduledOverridesSummary: min=0 time=2022-08-16 22:00:00 +0000 UTC
To Reproduce
1. Shut down nodes constantly, or kill directly the pods doing `kubectl delete`. Or just wait a few weeks/months with cluster autoscaling scaling in and out constantly.
2. Watch the list returned by the `ListRunners` endpoint grow up to hundreds of thousands.
3. Controller goes nuts and doesn't work properly. Examples: lots of runner pods in Pending state, runner pods not getting deleted, other RunnerDeployments starving without any replicas, and so on.
Describe the bug
The ARC is "leaking" offline runners on github in specific situations, for example when the cluster autoscaler scales out nodes and there's idle runners running there, they don't "auto-remove" themselves from github and IMO that would be responsibility of the controller. This issue was probably introduced on 0.22 since on the release notes it says that there were removed unnecessary "Remove Runner" API Calls.
It is visible on https://github.com/organizations/MyOrg/settings/actions/runners
that the list increases every day a little bit until it starts affecting the controller that needs to execute the ListRunners
function on the autoscaling algorithm (see here). This becomes a problem when we get into a situation where there is thousands of Offline runners accumulated and the controller needs to iterate over all the runners (see here).
In my case, I noticed that the controller was not working correctly and there were more than 300 pages of offline runners listed for my organization. After I deleted all the offline runners, the controller started to work again. Now I'm running a script at the end of the day to remove all the offline runners that were "leaked" by the controller during the day to avoid this from happening again.
Describe the expected behavior
I would expect the controller to clean up all the offline runners from the github api in all situations. If that doesn't happen they will start to accumulate until the controller is affected when executing the ListRunners
function.
Controller Logs
I don't have logs from the controller from when this was happening and now since I deleted the offline runners it is working properly. I also noticed that the `ListRunners` function (https://github.com/actions-runner-controller/actions-runner-controller/blob/538e2783d7fde279b84f1ff9351bb1486823660b/github/github.go#L219-L244) is not logging anything and therefore, logs wouldn't be helpful to debug this issue.
Runner Pod Logs
https://gist.github.com/bmbferreira/0ddb1efe1fe8e0dcfe4615aec2c6150a
Additional Context
No response
@bmbferreira Hey! This does look like a bug. But we do need controller logs to deduce what's causing it. ARC does have a logic that ensures the runner to be removed from GitHub whenever a K8s CRD Runner resource is deleted. We use K8s finalizer to ensure that.
When a node shutdown happens, the pod might eventually fail due to missing containers and the node that it tend to have scheduled, but the K8s CRD Runner resource won't automatically disappear. It's ARC's responsibility to "clean up" the Runner whose pod disappeared. And the cleanup process does include the step that unregisters/removes the runner object from GitHub so that there will be no offline runners remaining.
Is it that ARC somehow removed Runner resource before unregistering the runner from GitHub, and/or it somehow gave up unregistering the runner from GitHub? I think in the latter case the controller log should contain some errors due to GitHub API failures. If you have other data points/information to share, that would be super helpful!
Thanks in advance for your support.
I have the same problem. here are my settings and logs from the controller for one example.
Controller Version latest (summerwind/actions-runner-controller@sha256:92faf7e9f7f09a6240cdb5eb82eaf448852bdddf2fb77d0a5669fd8e5062b97b) Helm Chart Version 0.20.2 CertManager Version 1.0.4 Deployment Method Helm (fluxcd)
ops-action-runner-controller-actions-runner-controller-78467m5n manager {"level":"error","ts":"2022-08-17T11:45:23Z","logger":"actions-runner-controller.runnerpod","msg":"Failed to update runner for finalizer removal","runnerpod":"ops/action-runner-deployment-cjlr8-crs7l","error":"pods \"action-runner-deployment-cjlr8-crs7l\" not found","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:121\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:320\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:273\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:234"}
ops-action-runner-controller-actions-runner-controller-78467m5n manager {"level":"error","ts":"2022-08-17T11:45:23Z","msg":"Reconciler error","controller":"runnerpod-controller","controllerGroup":"","controllerKind":"Pod","pod":{"name":"action-runner-deployment-cjlr8-crs7l","namespace":"ops"},"namespace":"ops","name":"action-runner-deployment-cjlr8-crs7l","reconcileID":"56d292ce-0ec6-4be1-8221-bbd1a383bcff","error":"pods \"action-runner-deployment-cjlr8-crs7l\" not found","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:273\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:234"}
@jbuettnerbild Thanks for sharing!
Hmm, did your K8s control-plane automatically removed the pod that was running on a disappeared node, without waiting for the pod finalizer to be removed?
I thought the K8s contract is that nothing can be normally deleted without all the finalizers are removed from the said resource by respective owners of finalizers. I don't understand how that can happen yet 🤔
Do you, by any chance, have any other jobs or k8s controllers that "force-delete" the pods that were running on a disappeared node? A force-delete here means that one that doesn't wait for the finalizer to be removed by the owner of the finalizer. kubectl delete --force
does wait for the finalizer so it might not be the issue. If you periocally run kubectl patch $POD
to remove finalizers and then kubectl delete $POD
for pods on disappeared nods, it can cause issues like that.
I thing i found the reason for this example. We use Spot Instances. And the aws node termination handler. The node where the runner was running would terminated by AWS (instance-terminated-no-capacity). My guess is that the node termination handler couldn't evacuate the runner in time.
For this i must find a solution but i found also another example where the node is still in use and the runner goes offline and won't removed from github. here are the logs from the controller:
{"level":"error","ts":"2022-08-17T11:40:56Z","logger":"actions-runner-controller.runnerpod","msg":"Failed to update runner for finalizer removal","runnerpod":"ops/action-runner-deployment-cjlr8-h9bpn","error":"pods \"action-runner-deployment-cjlr8-h9bpn\" not found","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:121\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:320\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:273\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:234"}
and shortly before from kubelet:
11:40:56.209357 4739 kubelet_pods.go:1243] Killing unwanted pod "action-runner-deployment-cjlr8-h9bpn"
11:40:56.064276 4739 kubelet.go:1955] SyncLoop (PLEG): "action-runner-deployment-cjlr8-h9bpn_ops(c5866e9b-2471-4628-8846-0dfb10e3ee0e)", event: &pleg.PodLifecycleEvent{ID:"c5866e9b-2471-4628-8846-0dfb10e3ee0e", Type:"ContainerDied", Data:"23ce8510f8d8b7a116d91b25f57cd9a1977efe1a24cf9ef481fef17c80ace77c"}
11:39:54.749364 4739 kubelet.go:1955] SyncLoop (PLEG): "action-runner-deployment-cjlr8-h9bpn_ops(c5866e9b-2471-4628-8846-0dfb10e3ee0e)", event: &pleg.PodLifecycleEvent{ID:"c5866e9b-2471-4628-8846-0dfb10e3ee0e", Type:"ContainerStarted", Data:"23ce8510f8d8b7a116d91b25f57cd9a1977efe1a24cf9ef481fef17c80ace77c"}
Thanks!
"msg":"Failed to update runner for finalizer removal","runnerpod":"ops/action-runner-deployment-cjlr8-h9bpn","error":"pods \"action-runner-deployment-cjlr8-h9bpn\" not found"
This is the most interesting part- Who might have actually removed the pod without waiting for the finalizer? 🤔 ARC would want the pod to remain until ARC removes the finalizer from it.
Hello again! So, ~24 hours have passed, developers pushed code, deployed, cluster had a lot of activity and I can see more offline runners leaked on https://github.com/organizations/MyOrg/settings/actions/runners
. For example, this runner bear-4vfzj-tskmh
that shows on the list as offline, if I search for it on the controller logs, this is what I get:
| August 17th 2022, 10:37:38.734 | 2022-08-17T10:37:38Z INFO actions-runner-controller.runnerpod Runner pod is marked as already unregistered. {"runnerpod": "github-runner/bear-4vfzj-tskmh"}
| August 17th 2022, 10:37:38.715 | 2022-08-17T10:37:38Z INFO actions-runner-controller.runner Removed finalizer {"runner": "github-runner/bear-4vfzj-tskmh"}
| August 17th 2022, 10:37:18.022 | 2022-08-17T10:37:18Z INFO actions-runner-controller.runnerpod Runner pod is marked as already unregistered. {"runnerpod": "github-runner/bear-4vfzj-tskmh"}
| August 17th 2022, 10:37:13.752 | 2022-08-17T10:37:13Z INFO actions-runner-controller.runnerpod Runner pod is marked as already unregistered. {"runnerpod": "github-runner/bear-4vfzj-tskmh"}
| August 17th 2022, 10:37:13.705 | 2022-08-17T10:37:13Z INFO actions-runner-controller.runnerpod Runner has just been unregistered. {"runnerpod": "github-runner/bear-4vfzj-tskmh"}
| August 17th 2022, 10:37:13.284 | 2022-08-17T10:37:13Z DEBUG actions-runner-controller.runner Runner appears to have been registered and running. {"runner": "github-runner/bear-4vfzj-tskmh", "podCreationTimestamp": "2022-08-17 10:37:07 +0000 UTC"}
| August 17th 2022, 10:37:07.943 | 2022-08-17T10:37:07Z DEBUG events Normal {"object": {"kind":"Runner","namespace":"github-runner","name":"bear-4vfzj-tskmh","uid":"5bfcca07-e8ea-457d-ad4e-7d96b7df36fb","apiVersion":"actions.summerwind.dev/v1alpha1","resourceVersion":"22987546"}, "reason": "PodCreated", "message": "Created pod 'bear-4vfzj-tskmh'"}
| August 17th 2022, 10:37:07.943 | 2022-08-17T10:37:07Z INFO actions-runner-controller.runner Created runner pod {"runner": "github-runner/bear-4vfzj-tskmh", "repository": ""}
| August 17th 2022, 10:37:07.920 | 2022-08-17T10:37:07Z DEBUG actions-runner-controller.runnerreplicaset Skipped reconcilation because owner is not synced yet {"runnerreplicaset": "github-runner/bear-4vfzj", "owner": "github-runner/bear-4vfzj-tskmh", "pods": null}
| August 17th 2022, 10:37:07.918 | 2022-08-17T10:37:07Z DEBUG events Normal {"object": {"kind":"Runner","namespace":"github-runner","name":"bear-4vfzj-tskmh","uid":"5bfcca07-e8ea-457d-ad4e-7d96b7df36fb","apiVersion":"actions.summerwind.dev/v1alpha1","resourceVersion":"22987543"}, "reason": "RegistrationTokenUpdated", "message": "Successfully update registration token"}
| August 17th 2022, 10:37:07.918 | 2022-08-17T10:37:07Z INFO actions-runner-controller.runner Updated registration token {"runner": "bear-4vfzj-tskmh", "repository": ""}
| August 17th 2022, 10:37:07.902 | 2022-08-17T10:37:07Z DEBUG actions-runner-controller.runnerreplicaset Skipped reconcilation because owner is not synced yet {"runnerreplicaset": "github-runner/bear-4vfzj", "owner": "github-runner/bear-4vfzj-tskmh", "pods": null}
| August 17th 2022, 10:37:07.892 | 2022-08-17T10:37:07Z DEBUG actions-runner-controller.runnerreplicaset Skipped reconcilation because owner is not synced yet {"runnerreplicaset": "github-runner/bear-4vfzj", "owner": "github-runner/bear-4vfzj-tskmh", "pods": null}
| August 17th 2022, 10:37:07.876 | 2022-08-17T10:37:07Z DEBUG actions-runner-controller.runnerreplicaset Skipped reconcilation because owner is not synced yet {"runnerreplicaset": "github-runner/bear-4vfzj", "owner": "github-runner/bear-4vfzj-tskmh", "pods": null}
I can also confirm that this pod is not stuck at terminating, the runner
CRD does not exist anymore on the namespace but it still shows as offline on github UI/API.
In total, during the last 24 hours, there were 16 "leaked" runners. After a few weeks/months at this rate the ListRunner
function will start to struggle if I don't run a script to remove all of them at the end of the day. 😓
@bmbferreira take a look at https://github.blog/changelog/2022-08-03-github-actions-remove-offline-self-hosted-runners/
Nice hack, but still ~14 days is a bit too much 🥲~ I misread the article, is 1 day for ephemeral.
oh wow! Thanks for sharing that @jbuettnerbild! I had no idea and it seems really recent! I think the thousands of offline runners I had a few days ago that accumulated for months were not covered by this recent change 😅 Well, if ephemeral runners will just stay there for one day as they say, it saves me from running a script to clean it up every day 🤷♂️
Thanks for sharing! In a standard scenario though- ARC should not miss unregistering runners hence there should be no dangling offline runners (Although it might take a few minutes until a runner to disappear from GitHub). if it doesn't work, probably something in your cluster is preventing ARC from doing proper clean-up.
Hi one update from my side. After updating the runner to the newest version (2.287.1 -> 2.296.0), the problem with the offline runners are solved. Now if a runner is terminated, the corresponding runner in github disappears.
in the logs i see:
action-runner-deployment-bkx2b-l7x88 runner An error occurred: Access denied. System:ServiceIdentity;DDDDDDDD-DDDD-DDDD-DDDD-DDDDDDDDDDDD needs View permissions to perform the action.
action-runner-deployment-bkx2b-l7x88 runner Runner listener exit with retryable error, re-launch runner in 5 seconds.
action-runner-deployment-bkx2b-l7x88 runner Restarting runner...
action-runner-deployment-bkx2b-l7x88 runner
action-runner-deployment-bkx2b-l7x88 runner √ Connected to GitHub
action-runner-deployment-bkx2b-l7x88 runner
action-runner-deployment-bkx2b-l7x88 runner Failed to create a session. The runner registration has been deleted from the server, please re-configure.
action-runner-deployment-bkx2b-l7x88 runner Runner listener exit with terminated error, stop the service, no retry needed.
action-runner-deployment-bkx2b-l7x88 runner Exiting runner...
- action-runner-deployment-bkx2b-l7x88 › runner
And one more comment about the change from Github that automatically removes the old offline runner: I can't confirm that. We have a bunch of old runners marked as offline that are several days old. They may be removed after 30 days.
@jbuettnerbild Hey! Thanks for the update. If the 24h automated ephemeral runner cleanup doesn't work, I think it's due to something going wrong on the GitHub side. Could you open a support ticket to GitHub so that they can check their backend to see what's happening?
Facing the same issue. I also tried updating the runner agent to above version (2.296.0) but it didn't fix it. Running against Github Enterprise Server v3.5.4.
Currently it is "working" for us. I mean "working" because runners are still "leaked" but they are de-listed from github after ~24 hours. Either way for the ones having this issue, I leave these two suggestions:
- have some kind of monitoring that checks periodically using github api the amount of offline runners still there (if the list is too big, that will be an issue because controller will be blocked when calling the list api)
- have some cron job that removes the offline runners periodically to guarantee that the list doesn't grow too much.
BTW, this is the function ARC is using to unregister runners 100% https://github.com/actions-runner-controller/actions-runner-controller/blob/12c4d9625022859556b8288c9ee3929ebe720d54/controllers/runner_graceful_stop.go#L72
ARC does have a deletion timeout of 1 minute which can result in leaking runners when it failed to unregister the runner within 1 minute. But this can happen only when you manually deleted a Runner resource and/or runner Pod resource. https://github.com/actions-runner-controller/actions-runner-controller/blob/5651ba6eaddf46a97af68a34d8b2731cf13b6ba9/controllers/runner_pod_controller.go#L150-L177
Can you spot the issue if this is happening due to an unseen bug in ARC? I had no luck so far..
We are experiencing the same issue, but the controller logs look different in our case:
022-11-29T11:13:07Z DEBUG actions-runner-controller.runner Runner appears to have been registered and running. {"runner": "developers-runner/developers-runner-fzfqf-9sv7l", "podCreationTimestamp": "2022-11-29 11:12:54 +0000 UTC"}
2022-11-29T11:13:07Z DEBUG controller-runtime.webhook.webhooks received request {"webhook": "/mutate-actions-summerwind-dev-v1alpha1-runner", "UID": "51b932ff-3152-4e1b-853a-84ed80b17d50", "kind": "actions.summerwind.dev/v1alpha1, Kind=Runner", "resource": {"group":"actions.summerwind.dev","version":"v1alpha1","resource":"runners"}}
2022-11-29T11:13:07Z DEBUG controller-runtime.webhook.webhooks wrote response {"webhook": "/mutate-actions-summerwind-dev-v1alpha1-runner", "code": 200, "reason": "", "UID": "51b932ff-3152-4e1b-853a-84ed80b17d50", "allowed": true}
2022-11-29T11:13:07Z ERROR actions-runner-controller.runnerreplicaset Failed to patch owner to have actions-runner/unregistration-request-timestamp annotation {"runnerreplicaset": "developers-runner/developers-runner-fzfqf", "lastSyncTime": "2022-11-29T11:12:53Z", "effectiveTime": "<nil>", "templateHashDesired": "5b578679f8", "replicasDesired": 10, "replicasPending": 1, "replicasRunning": 10, "replicasMaybeRunning": 11, "templateHashObserved": ["5b578679f8"], "owner": "developers-runner/developers-runner-fzfqf-9sv7l", "error": "Operation cannot be fulfilled on runners.actions.summerwind.dev \"developers-runner-fzfqf-9sv7l\": the object has been modified; please apply your changes to the latest version and try again"}
github.com/actions-runner-controller/actions-runner-controller/controllers.(*RunnerReplicaSetReconciler).Reconcile
/workspace/controllers/runnerreplicaset_controller.go:131
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:121
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:320
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:273
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:234
2022-11-29T11:13:07Z ERROR Reconciler error {"controller": "runnerreplicaset-controller", "controllerGroup": "actions.summerwind.dev", "controllerKind": "RunnerReplicaSet", "runnerReplicaSet": {"name":"developers-runner-fzfqf","namespace":"developers-runner"}, "namespace": "developers-runner", "name": "developers-runner-fzfqf", "reconcileID": "ef7ce398-edad-4d15-a648-9058ec541afe", "error": "Operation cannot be fulfilled on runners.actions.summerwind.dev \"developers-runner-fzfqf-9sv7l\": the object has been modified; please apply your changes to the latest version and try again"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:273
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:234
2022-11-29T11:13:07Z DEBUG controller-runtime.webhook.webhooks received request {"webhook": "/mutate-actions-summerwind-dev-v1alpha1-runner", "UID": "b2b9c604-9651-4b04-a170-9cd20934c98d", "kind": "actions.summerwind.dev/v1alpha1, Kind=Runner", "resource": {"group":"actions.summerwind.dev","version":"v1alpha1","resource":"runners"}}
2022-11-29T11:13:07Z DEBUG controller-runtime.webhook.webhooks wrote response {"webhook": "/mutate-actions-summerwind-dev-v1alpha1-runner", "code": 200, "reason": "", "UID": "b2b9c604-9651-4b04-a170-9cd20934c98d", "allowed": true}
2022-11-29T11:13:07Z DEBUG controller-runtime.webhook.webhooks received request {"webhook": "/validate-actions-summerwind-dev-v1alpha1-runner", "UID": "5647d714-c656-4300-b8c9-8628f15e672b", "kind": "actions.summerwind.dev/v1alpha1, Kind=Runner", "resource": {"group":"actions.summerwind.dev","version":"v1alpha1","resource":"runners"}}
2022-11-29T11:13:07Z INFO runner-resource validate resource to be updated {"name": "developers-runner-fzfqf-27gjv"}
Is anyone else seeing these logs? We're using chart version 0.20.2
. We will try upgrading to the latest version today to see if that makes any difference
We are experiencing the same issue, but the controller logs look different in our case:
022-11-29T11:13:07Z DEBUG actions-runner-controller.runner Runner appears to have been registered and running. {"runner": "developers-runner/developers-runner-fzfqf-9sv7l", "podCreationTimestamp": "2022-11-29 11:12:54 +0000 UTC"} 2022-11-29T11:13:07Z DEBUG controller-runtime.webhook.webhooks received request {"webhook": "/mutate-actions-summerwind-dev-v1alpha1-runner", "UID": "51b932ff-3152-4e1b-853a-84ed80b17d50", "kind": "actions.summerwind.dev/v1alpha1, Kind=Runner", "resource": {"group":"actions.summerwind.dev","version":"v1alpha1","resource":"runners"}} 2022-11-29T11:13:07Z DEBUG controller-runtime.webhook.webhooks wrote response {"webhook": "/mutate-actions-summerwind-dev-v1alpha1-runner", "code": 200, "reason": "", "UID": "51b932ff-3152-4e1b-853a-84ed80b17d50", "allowed": true} 2022-11-29T11:13:07Z ERROR actions-runner-controller.runnerreplicaset Failed to patch owner to have actions-runner/unregistration-request-timestamp annotation {"runnerreplicaset": "developers-runner/developers-runner-fzfqf", "lastSyncTime": "2022-11-29T11:12:53Z", "effectiveTime": "<nil>", "templateHashDesired": "5b578679f8", "replicasDesired": 10, "replicasPending": 1, "replicasRunning": 10, "replicasMaybeRunning": 11, "templateHashObserved": ["5b578679f8"], "owner": "developers-runner/developers-runner-fzfqf-9sv7l", "error": "Operation cannot be fulfilled on runners.actions.summerwind.dev \"developers-runner-fzfqf-9sv7l\": the object has been modified; please apply your changes to the latest version and try again"} github.com/actions-runner-controller/actions-runner-controller/controllers.(*RunnerReplicaSetReconciler).Reconcile /workspace/controllers/runnerreplicaset_controller.go:131 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:121 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:320 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:273 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2 /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:234 2022-11-29T11:13:07Z ERROR Reconciler error {"controller": "runnerreplicaset-controller", "controllerGroup": "actions.summerwind.dev", "controllerKind": "RunnerReplicaSet", "runnerReplicaSet": {"name":"developers-runner-fzfqf","namespace":"developers-runner"}, "namespace": "developers-runner", "name": "developers-runner-fzfqf", "reconcileID": "ef7ce398-edad-4d15-a648-9058ec541afe", "error": "Operation cannot be fulfilled on runners.actions.summerwind.dev \"developers-runner-fzfqf-9sv7l\": the object has been modified; please apply your changes to the latest version and try again"} sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:273 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2 /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:234 2022-11-29T11:13:07Z DEBUG controller-runtime.webhook.webhooks received request {"webhook": "/mutate-actions-summerwind-dev-v1alpha1-runner", "UID": "b2b9c604-9651-4b04-a170-9cd20934c98d", "kind": "actions.summerwind.dev/v1alpha1, Kind=Runner", "resource": {"group":"actions.summerwind.dev","version":"v1alpha1","resource":"runners"}} 2022-11-29T11:13:07Z DEBUG controller-runtime.webhook.webhooks wrote response {"webhook": "/mutate-actions-summerwind-dev-v1alpha1-runner", "code": 200, "reason": "", "UID": "b2b9c604-9651-4b04-a170-9cd20934c98d", "allowed": true} 2022-11-29T11:13:07Z DEBUG controller-runtime.webhook.webhooks received request {"webhook": "/validate-actions-summerwind-dev-v1alpha1-runner", "UID": "5647d714-c656-4300-b8c9-8628f15e672b", "kind": "actions.summerwind.dev/v1alpha1, Kind=Runner", "resource": {"group":"actions.summerwind.dev","version":"v1alpha1","resource":"runners"}} 2022-11-29T11:13:07Z INFO runner-resource validate resource to be updated {"name": "developers-runner-fzfqf-27gjv"}
Is anyone else seeing these logs? We're using chart version
0.20.2
. We will try upgrading to the latest version today to see if that makes any difference
We are also facing this error, using latest ARC version 0.26.0
I have also this error with version 0.20.2 but the leaking runners are gone. Looks like it is working how it should.