actions-runner-controller
actions-runner-controller copied to clipboard
EphemeralRunners are stuck in failed state after the job succeeds
Checks
- [x] I've already read https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners-with-actions-runner-controller/troubleshooting-actions-runner-controller-errors and I'm sure my issue is not covered in the troubleshooting guide.
- [x] I am using charts that are officially provided
Controller Version
0.12.0
Deployment Method
ArgoCD
Checks
- [x] This isn't a question or user support case (For Q&A and community support, go to Discussions).
- [x] I've read the Changelog before submitting this issue and I'm sure it's not due to any recently-introduced backward-incompatible changes
To Reproduce
Not consistently reproducible, there's a small chance for it to happen whenever a job succeeds.
Just run some workload, then check that no EphemeralRunners are stuck in a failed state.
Describe the bug
After the workload succeeds the controller marks the EphemeralRunner as failed and it doesn't create a new pod, just hangs in a failed state until manually removed. There is always exactly 1 failure with a timestamp: "status": { "failures": {"<uuid>": "<timestamp>"}}.
It probably lingers in the Github API a bit longer after the pod dies and the controller treats it as a failure. It calls deletePodAsFailed which is what's visible in the log excerpt.
After that it goes into backoff but it is never processed again. After the backoff period elapses there are no further logs available referencing the EphemeralRunner and it remains stuck and unmanaged.
For now we are removing these orphans periodically but the orphans seem to negatively impact CI job startup times.
The runners are eventually removed from Github API because I manually checked them and they were no longer present in Github. Yet the EphemeralRunners remain stuck.
Describe the expected behavior
The EphemeralRunner is cleanly removed once Github releases it. It should keep reconciling after the backoff period elapses instead of giving up on it silently.
Additional Context
Using a simple runnerset with DinD mode, cloud Github organization installation (via Github app).
Controller Logs
https://gist.github.com/Dawnflash/0a3fc1da0f99dfe67fc17b6987821a53
Runner Pod Logs
Don't have those but the jobs normally succeed. All green in Github.
Hello! Thank you for filing an issue.
The maintainers will triage your issue shortly.
In the meantime, please take a look at the troubleshooting guide for bug reports.
If this is a feature request, please review our contribution guidelines.
We are suffering from the same issue
@nikola-jokic I think this is something connected to the change from this pr https://github.com/actions/actions-runner-controller/pull/4059 ,w e are seeing the same, the pod is get deleted, but the EphemeralRunner object is keep in the state running like this
Status:
Failures:
9b0c5e46-bf2c-41cc-90f2-6fcc4f57f599: 2025-06-19T11:29:42Z
Job Repository Name: xxxxshared-workflows-github-actions
Job Workflow Ref: xxxxx.yaml@refs/heads/main
Phase: Running
Ready: false
Runner Id: 15718317
Hey, this might not be related to the actual controller change. Looking at the log, we see that the ephemeral runner finishes, but it exists. That shouldn't happen. After the ephemeral runner is done executing the job, it should self-delete. Therefore, the issue might be on the back-end side. Can you please share the workflow run URL?
Hey, is anyone running ARC with a version older than 0.12.0 and experiencing this?
@nikola-jokic yes, I can share the workflow URL, but just before I want to show you how bad is the situation, this is all our runners right now. https://gist.github.com/kyrylomiro/64c559e7d3608fd459443f4a25328c12 so all of them that has errors, actually doesn't have pods, but the state keeps saying it's running, and actually our scheduling time now is reaching to n minutes. Workflow URL, trying to find it now.
@nikola-jokic this is the url and exact job the cause the runner to end up in m-runner-hvlmr-runner-4x2cs Running map[53171800-7476-4af6-8a7b-00f286b15671:2025-06-19T15:13:22Z]
@nikola-jokic the run is this one where github is showing that workflow is still running but runner is already gone
I can corroborate this issue, was going to open it myself but didn't get a chance to yet. The EphemeralRunners exist indefinitely with a .status.phase: Running. I'll share the cronjob setup I added to buy me time to continue investigating without blocking our users' job startups:
zombie-runner-cleanup.yaml
(obviously change the namespace as needed for your env)
apiVersion: v1
kind: ServiceAccount
metadata:
name: zombie-runner-cleanup
namespace: gha-runners
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: zombie-runner-cleanup
namespace: gha-runners
rules:
- apiGroups:
- ""
resources:
- pods
verbs:
- get
- list
- apiGroups:
- actions.github.com
resources:
- ephemeralrunners
verbs:
- get
- list
- delete
- watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: zombie-runner-cleanup
namespace: gha-runners
subjects:
- kind: ServiceAccount
name: zombie-runner-cleanup
namespace: gha-runners
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: zombie-runner-cleanup
---
apiVersion: batch/v1
kind: CronJob
metadata:
name: zombie-runner-cleanup
namespace: gha-runners
spec:
schedule: "*/10 * * * *" # Every 10 minutes
jobTemplate:
spec:
template:
spec:
serviceAccountName: zombie-runner-cleanup
containers:
- name: zombie-runner-cleanup
image: alpine:3
command:
- /bin/sh
- -c
- |
apk add kubectl yq
#!/bin/sh
set -e
echo ""
echo "Starting cleanup task..."
PODS_FILE="/tmp/pods.txt"
RUNNERS_FILE="/tmp/runners.txt"
DIFF_FILE="/tmp/runners_diff.txt"
NS="gha-runners"
SELECTOR="app.kubernetes.io/part-of=gha-runner-scale-set"
kubectl -n $NS get pods -l $SELECTOR -o jsonpath='{range .items[*]}{.metadata.name}{"\n"}{end}' > $PODS_FILE
kubectl -n $NS get ephemeralrunners -o yaml | yq '.items[] | select(.status.phase == "Running") | .metadata.name' > $RUNNERS_FILE
## Subtract Pods from running EphemeralRunners to find runners that no longer have a pod
comm -13 <(sort $PODS_FILE) <(sort $RUNNERS_FILE) > $DIFF_FILE
echo "Runner pods: $(wc -l $PODS_FILE | awk -F' ' '{print $1}')"
echo "Ephemeral runners: $(wc -l $RUNNERS_FILE | awk -F' ' '{print $1}')"
echo "Found $(wc -l $DIFF_FILE | awk -F' ' '{print $1}') ephemeral runners without pods"
for runner in $(cat $DIFF_FILE); do
kubectl -n $NS delete ephemeralrunner $runner
done
rm $PODS_FILE $RUNNERS_FILE $DIFF_FILE
echo "Done."
restartPolicy: OnFailure
This issue was not happening to our runners in 0.11.0. And it is not intermittent, in the sense that the overall issue doesn't come and go by the day, it is always affecting some percentage of our runners, but it IS intermittent in the sense that it seems to happen randomly to our jobs, with no discernible difference. It seems to affect roughly 2-20 jobs an hour for us. If we don't clean them up, the controller seems to be counting those as part of the current scale metric so it doesn't think it needs more runners added to meet demand, thus the increasing length of job queue.
@nikola-jokic I don't know if there are others on 0.11.0 still who are experiencing this same issue, but I doubt it, based on the fact that immediately after upgrading ours from 0.11.0 to 0.12.0 this issue started. The disappointing part is we eagerly upgraded because of https://github.com/actions/actions-runner-controller/issues/3685, only to get hit with this arguably worse bug.
@nimjor we upgraded from 0.9.3 also because of that bug, now i don't know which one i prefer lol i can confirm that with your script we didn't have issues this morning, let's see thanks!
In case it's helpful, I can share the controller logs for one example runner before the upgrade and one after.
We use Loki for logging, so these are the controller logs filtered by the EphemeralRunner name for relevancy, like:
{namespace="gha-runners", pod=~"github-runner-controller-gha-rs-controller-.*"} |= `<EPHEMERALRUNNER_NAME_HERE>`
0.11.0
📗 Controller Logs
2025-06-14T12:01:23Z INFO EphemeralRunnerSet Created new ephemeral runner {"version": "0.11.0", "ephemeralrunnerset": {"name":"redacted-ubuntu-blue-r8m4f","namespace":"gha-runners"}, "runner": "redacted-ubuntu-blue-r8m4f-runner-ljvb4"}
2025-06-14T12:01:23Z INFO EphemeralRunner Adding finalizer {"version": "0.11.0", "ephemeralrunner": {"name":"redacted-ubuntu-blue-r8m4f-runner-ljvb4","namespace":"gha-runners"}}
2025-06-14T12:01:23Z INFO EphemeralRunner Successfully added finalizer {"version": "0.11.0", "ephemeralrunner": {"name":"redacted-ubuntu-blue-r8m4f-runner-ljvb4","namespace":"gha-runners"}}
2025-06-14T12:01:23Z INFO EphemeralRunner Adding runner registration finalizer {"version": "0.11.0", "ephemeralrunner": {"name":"redacted-ubuntu-blue-r8m4f-runner-ljvb4","namespace":"gha-runners"}}
2025-06-14T12:01:23Z INFO EphemeralRunner Successfully added runner registration finalizer {"version": "0.11.0", "ephemeralrunner": {"name":"redacted-ubuntu-blue-r8m4f-runner-ljvb4","namespace":"gha-runners"}}
2025-06-14T12:01:23Z INFO EphemeralRunner Creating new ephemeral runner registration and updating status with runner config {"version": "0.11.0", "ephemeralrunner": {"name":"redacted-ubuntu-blue-r8m4f-runner-ljvb4","namespace":"gha-runners"}}
2025-06-14T12:01:23Z INFO EphemeralRunner Creating ephemeral runner JIT config {"version": "0.11.0", "ephemeralrunner": {"name":"redacted-ubuntu-blue-r8m4f-runner-ljvb4","namespace":"gha-runners"}}
2025-06-14T12:01:24Z INFO EphemeralRunner Created ephemeral runner JIT config {"version": "0.11.0", "ephemeralrunner": {"name":"redacted-ubuntu-blue-r8m4f-runner-ljvb4","namespace":"gha-runners"}, "runnerId": 10419516}
2025-06-14T12:01:24Z INFO EphemeralRunner Updating ephemeral runner status with runnerId and runnerJITConfig {"version": "0.11.0", "ephemeralrunner": {"name":"redacted-ubuntu-blue-r8m4f-runner-ljvb4","namespace":"gha-runners"}}
2025-06-14T12:01:24Z INFO EphemeralRunner Updated ephemeral runner status with runnerId and runnerJITConfig {"version": "0.11.0", "ephemeralrunner": {"name":"redacted-ubuntu-blue-r8m4f-runner-ljvb4","namespace":"gha-runners"}}
2025-06-14T12:01:24Z INFO EphemeralRunner Creating new ephemeral runner secret for jitconfig. {"version": "0.11.0", "ephemeralrunner": {"name":"redacted-ubuntu-blue-r8m4f-runner-ljvb4","namespace":"gha-runners"}}
2025-06-14T12:01:24Z INFO EphemeralRunner Creating new secret for ephemeral runner {"version": "0.11.0", "ephemeralrunner": {"name":"redacted-ubuntu-blue-r8m4f-runner-ljvb4","namespace":"gha-runners"}}
2025-06-14T12:01:24Z INFO EphemeralRunner Created new secret spec for ephemeral runner {"version": "0.11.0", "ephemeralrunner": {"name":"redacted-ubuntu-blue-r8m4f-runner-ljvb4","namespace":"gha-runners"}}
2025-06-14T12:01:24Z INFO EphemeralRunner Created ephemeral runner secret {"version": "0.11.0", "ephemeralrunner": {"name":"redacted-ubuntu-blue-r8m4f-runner-ljvb4","namespace":"gha-runners"}, "secretName": "redacted-ubuntu-blue-r8m4f-runner-ljvb4"}
2025-06-14T12:01:24Z INFO EphemeralRunner Creating new EphemeralRunner pod. {"version": "0.11.0", "ephemeralrunner": {"name":"redacted-ubuntu-blue-r8m4f-runner-ljvb4","namespace":"gha-runners"}}
2025-06-14T12:01:24Z INFO EphemeralRunner Creating new pod for ephemeral runner {"version": "0.11.0", "ephemeralrunner": {"name":"redacted-ubuntu-blue-r8m4f-runner-ljvb4","namespace":"gha-runners"}}
2025-06-14T12:01:24Z INFO EphemeralRunner Created new pod spec for ephemeral runner {"version": "0.11.0", "ephemeralrunner": {"name":"redacted-ubuntu-blue-r8m4f-runner-ljvb4","namespace":"gha-runners"}}
2025-06-14T12:01:24Z INFO EphemeralRunner Created ephemeral runner pod {"version": "0.11.0", "ephemeralrunner": {"name":"redacted-ubuntu-blue-r8m4f-runner-ljvb4","namespace":"gha-runners"}, "runnerScaleSetId": 263, "runnerName": "redacted-ubuntu-blue-r8m4f-runner-ljvb4", "runnerId": 10419516, "configUrl": "https://github.com/redacted", "podName": "redacted-ubuntu-blue-r8m4f-runner-ljvb4"}
2025-06-14T12:01:24Z INFO EphemeralRunner Waiting for runner container status to be available {"version": "0.11.0", "ephemeralrunner": {"name":"redacted-ubuntu-blue-r8m4f-runner-ljvb4","namespace":"gha-runners"}}
2025-06-14T12:01:24Z INFO EphemeralRunner Waiting for runner container status to be available {"version": "0.11.0", "ephemeralrunner": {"name":"redacted-ubuntu-blue-r8m4f-runner-ljvb4","namespace":"gha-runners"}}
2025-06-14T12:01:24Z INFO EphemeralRunner Ephemeral runner container is still running {"version": "0.11.0", "ephemeralrunner": {"name":"redacted-ubuntu-blue-r8m4f-runner-ljvb4","namespace":"gha-runners"}}
2025-06-14T12:01:24Z INFO EphemeralRunner Updating ephemeral runner status {"version": "0.11.0", "ephemeralrunner": {"name":"redacted-ubuntu-blue-r8m4f-runner-ljvb4","namespace":"gha-runners"}, "statusPhase": "Pending", "statusReason": "", "statusMessage": "", "ready": false}
2025-06-14T12:01:24Z INFO EphemeralRunner Updated ephemeral runner status {"version": "0.11.0", "ephemeralrunner": {"name":"redacted-ubuntu-blue-r8m4f-runner-ljvb4","namespace":"gha-runners"}}
2025-06-14T12:01:24Z INFO EphemeralRunner Ephemeral runner container is still running {"version": "0.11.0", "ephemeralrunner": {"name":"redacted-ubuntu-blue-r8m4f-runner-ljvb4","namespace":"gha-runners"}}
2025-06-14T12:01:26Z INFO EphemeralRunner Ephemeral runner container is still running {"version": "0.11.0", "ephemeralrunner": {"name":"redacted-ubuntu-blue-r8m4f-runner-ljvb4","namespace":"gha-runners"}}
2025-06-14T12:01:26Z INFO EphemeralRunner Updating ephemeral runner status {"version": "0.11.0", "ephemeralrunner": {"name":"redacted-ubuntu-blue-r8m4f-runner-ljvb4","namespace":"gha-runners"}, "statusPhase": "Running", "statusReason": "", "statusMessage": "", "ready": true}
2025-06-14T12:01:26Z INFO EphemeralRunner Updated ephemeral runner status {"version": "0.11.0", "ephemeralrunner": {"name":"redacted-ubuntu-blue-r8m4f-runner-ljvb4","namespace":"gha-runners"}}
2025-06-14T12:01:26Z INFO EphemeralRunner Ephemeral runner container is still running {"version": "0.11.0", "ephemeralrunner": {"name":"redacted-ubuntu-blue-r8m4f-runner-ljvb4","namespace":"gha-runners"}}
2025-06-14T12:03:03Z INFO EphemeralRunner Ephemeral runner container is still running {"version": "0.11.0", "ephemeralrunner": {"name":"redacted-ubuntu-blue-r8m4f-runner-ljvb4","namespace":"gha-runners"}}
2025-06-14T12:03:54Z INFO EphemeralRunner Checking if runner exists in GitHub service {"version": "0.11.0", "ephemeralrunner": {"name":"redacted-ubuntu-blue-r8m4f-runner-ljvb4","namespace":"gha-runners"}, "runnerId": 10419516}
2025-06-14T12:03:54Z INFO EphemeralRunner Runner does not exist in GitHub service {"version": "0.11.0", "ephemeralrunner": {"name":"redacted-ubuntu-blue-r8m4f-runner-ljvb4","namespace":"gha-runners"}, "runnerId": 10419516}
2025-06-14T12:03:54Z INFO EphemeralRunner Ephemeral runner has finished since it does not exist in the service anymore {"version": "0.11.0", "ephemeralrunner": {"name":"redacted-ubuntu-blue-r8m4f-runner-ljvb4","namespace":"gha-runners"}}
2025-06-14T12:03:54Z INFO EphemeralRunner Updating ephemeral runner status to Finished {"version": "0.11.0", "ephemeralrunner": {"name":"redacted-ubuntu-blue-r8m4f-runner-ljvb4","namespace":"gha-runners"}}
2025-06-14T12:03:54Z INFO EphemeralRunner EphemeralRunner status is marked as Finished {"version": "0.11.0", "ephemeralrunner": {"name":"redacted-ubuntu-blue-r8m4f-runner-ljvb4","namespace":"gha-runners"}}
2025-06-14T12:03:54Z INFO EphemeralRunner Cleaning up resources after after ephemeral runner termination {"version": "0.11.0", "ephemeralrunner": {"name":"redacted-ubuntu-blue-r8m4f-runner-ljvb4","namespace":"gha-runners"}, "phase": "Succeeded"}
2025-06-14T12:03:54Z INFO EphemeralRunner Cleaning up the runner pod {"version": "0.11.0", "ephemeralrunner": {"name":"redacted-ubuntu-blue-r8m4f-runner-ljvb4","namespace":"gha-runners"}}
2025-06-14T12:03:54Z INFO EphemeralRunner Deleting the runner pod {"version": "0.11.0", "ephemeralrunner": {"name":"redacted-ubuntu-blue-r8m4f-runner-ljvb4","namespace":"gha-runners"}}
2025-06-14T12:03:54Z INFO EphemeralRunnerSet Deleting finished ephemeral runner {"version": "0.11.0", "ephemeralrunnerset": {"name":"redacted-ubuntu-blue-r8m4f","namespace":"gha-runners"}, "name": "redacted-ubuntu-blue-r8m4f-runner-ljvb4"}
2025-06-14T12:03:54Z INFO EphemeralRunner Deleted the runner pod {"version": "0.11.0", "ephemeralrunner": {"name":"redacted-ubuntu-blue-r8m4f-runner-ljvb4","namespace":"gha-runners"}}
2025-06-14T12:03:54Z INFO EphemeralRunner Cleaning up the runner jitconfig secret {"version": "0.11.0", "ephemeralrunner": {"name":"redacted-ubuntu-blue-r8m4f-runner-ljvb4","namespace":"gha-runners"}}
2025-06-14T12:03:54Z INFO EphemeralRunner Deleting the jitconfig secret {"version": "0.11.0", "ephemeralrunner": {"name":"redacted-ubuntu-blue-r8m4f-runner-ljvb4","namespace":"gha-runners"}}
2025-06-14T12:03:54Z INFO EphemeralRunner Deleted jitconfig secret {"version": "0.11.0", "ephemeralrunner": {"name":"redacted-ubuntu-blue-r8m4f-runner-ljvb4","namespace":"gha-runners"}}
2025-06-14T12:03:54Z INFO EphemeralRunner EphemeralRunner has already finished. Stopping reconciliation and waiting for EphemeralRunnerSet to clean it up {"version": "0.11.0", "ephemeralrunner": {"name":"redacted-ubuntu-blue-r8m4f-runner-ljvb4","namespace":"gha-runners"}, "phase": "Succeeded"}
2025-06-14T12:03:54Z INFO EphemeralRunner Trying to clean up runner from the service {"version": "0.11.0", "ephemeralrunner": {"name":"redacted-ubuntu-blue-r8m4f-runner-ljvb4","namespace":"gha-runners"}}
2025-06-14T12:03:54Z INFO EphemeralRunner Removing runner from the service {"version": "0.11.0", "ephemeralrunner": {"name":"redacted-ubuntu-blue-r8m4f-runner-ljvb4","namespace":"gha-runners"}, "runnerId": 10419516}
2025-06-14T12:03:54Z INFO EphemeralRunner Removed runner from the service {"version": "0.11.0", "ephemeralrunner": {"name":"redacted-ubuntu-blue-r8m4f-runner-ljvb4","namespace":"gha-runners"}, "runnerId": 10419516}
2025-06-14T12:03:54Z INFO EphemeralRunner Runner is cleaned up from the service, removing finalizer {"version": "0.11.0", "ephemeralrunner": {"name":"redacted-ubuntu-blue-r8m4f-runner-ljvb4","namespace":"gha-runners"}}
2025-06-14T12:03:54Z INFO EphemeralRunner Removed finalizer from ephemeral runner {"version": "0.11.0", "ephemeralrunner": {"name":"redacted-ubuntu-blue-r8m4f-runner-ljvb4","namespace":"gha-runners"}}
2025-06-14T12:03:54Z INFO EphemeralRunner Finalizing ephemeral runner {"version": "0.11.0", "ephemeralrunner": {"name":"redacted-ubuntu-blue-r8m4f-runner-ljvb4","namespace":"gha-runners"}}
2025-06-14T12:03:54Z INFO EphemeralRunner Cleaning up the runner pod {"version": "0.11.0", "ephemeralrunner": {"name":"redacted-ubuntu-blue-r8m4f-runner-ljvb4","namespace":"gha-runners"}}
2025-06-14T12:03:54Z INFO EphemeralRunner Pod contains deletion timestamp {"version": "0.11.0", "ephemeralrunner": {"name":"redacted-ubuntu-blue-r8m4f-runner-ljvb4","namespace":"gha-runners"}}
2025-06-14T12:03:54Z INFO EphemeralRunner Cleaning up the runner jitconfig secret {"version": "0.11.0", "ephemeralrunner": {"name":"redacted-ubuntu-blue-r8m4f-runner-ljvb4","namespace":"gha-runners"}}
2025-06-14T12:03:54Z INFO EphemeralRunner Runner jitconfig secret is deleted {"version": "0.11.0", "ephemeralrunner": {"name":"redacted-ubuntu-blue-r8m4f-runner-ljvb4","namespace":"gha-runners"}}
2025-06-14T12:03:54Z INFO EphemeralRunner Removing finalizer {"version": "0.11.0", "ephemeralrunner": {"name":"redacted-ubuntu-blue-r8m4f-runner-ljvb4","namespace":"gha-runners"}}
2025-06-14T12:03:54Z INFO EphemeralRunner Successfully removed finalizer after cleanup {"version": "0.11.0", "ephemeralrunner": {"name":"redacted-ubuntu-blue-r8m4f-runner-ljvb4","namespace":"gha-runners"}}
0.12.0
📕 Controller Logs
2025-06-18T14:37:01Z INFO EphemeralRunnerSet Created new ephemeral runner {"version": "0.12.0", "ephemeralrunnerset": {"name":"redacted-ubuntu-blue-pthw7","namespace":"gha-runners"}, "runner": "redacted-ubuntu-blue-pthw7-runner-r94hn"}
2025-06-18T14:37:01Z INFO EphemeralRunner Adding finalizer {"version": "0.12.0", "ephemeralrunner": {"name":"redacted-ubuntu-blue-pthw7-runner-r94hn","namespace":"gha-runners"}}
2025-06-18T14:37:01Z INFO EphemeralRunner Successfully added finalizer {"version": "0.12.0", "ephemeralrunner": {"name":"redacted-ubuntu-blue-pthw7-runner-r94hn","namespace":"gha-runners"}}
2025-06-18T14:37:01Z INFO EphemeralRunner Adding runner registration finalizer {"version": "0.12.0", "ephemeralrunner": {"name":"redacted-ubuntu-blue-pthw7-runner-r94hn","namespace":"gha-runners"}}
2025-06-18T14:37:01Z INFO EphemeralRunner Successfully added runner registration finalizer {"version": "0.12.0", "ephemeralrunner": {"name":"redacted-ubuntu-blue-pthw7-runner-r94hn","namespace":"gha-runners"}}
2025-06-18T14:37:01Z INFO EphemeralRunner Creating new ephemeral runner registration and updating status with runner config {"version": "0.12.0", "ephemeralrunner": {"name":"redacted-ubuntu-blue-pthw7-runner-r94hn","namespace":"gha-runners"}}
2025-06-18T14:37:01Z INFO EphemeralRunner Creating ephemeral runner JIT config {"version": "0.12.0", "ephemeralrunner": {"name":"redacted-ubuntu-blue-pthw7-runner-r94hn","namespace":"gha-runners"}}
2025-06-18T14:37:02Z INFO EphemeralRunner Created ephemeral runner JIT config {"version": "0.12.0", "ephemeralrunner": {"name":"redacted-ubuntu-blue-pthw7-runner-r94hn","namespace":"gha-runners"}, "runnerId": 10485340}
2025-06-18T14:37:02Z INFO EphemeralRunner Updating ephemeral runner status with runnerId and runnerJITConfig {"version": "0.12.0", "ephemeralrunner": {"name":"redacted-ubuntu-blue-pthw7-runner-r94hn","namespace":"gha-runners"}}
2025-06-18T14:37:02Z INFO EphemeralRunner Updated ephemeral runner status with runnerId and runnerJITConfig {"version": "0.12.0", "ephemeralrunner": {"name":"redacted-ubuntu-blue-pthw7-runner-r94hn","namespace":"gha-runners"}}
2025-06-18T14:37:02Z INFO EphemeralRunner Creating new ephemeral runner secret for jitconfig. {"version": "0.12.0", "ephemeralrunner": {"name":"redacted-ubuntu-blue-pthw7-runner-r94hn","namespace":"gha-runners"}}
2025-06-18T14:37:02Z INFO EphemeralRunner Creating new secret for ephemeral runner {"version": "0.12.0", "ephemeralrunner": {"name":"redacted-ubuntu-blue-pthw7-runner-r94hn","namespace":"gha-runners"}}
2025-06-18T14:37:02Z INFO EphemeralRunner Created new secret spec for ephemeral runner {"version": "0.12.0", "ephemeralrunner": {"name":"redacted-ubuntu-blue-pthw7-runner-r94hn","namespace":"gha-runners"}}
2025-06-18T14:37:02Z INFO EphemeralRunner Created ephemeral runner secret {"version": "0.12.0", "ephemeralrunner": {"name":"redacted-ubuntu-blue-pthw7-runner-r94hn","namespace":"gha-runners"}, "secretName": "redacted-ubuntu-blue-pthw7-runner-r94hn"}
2025-06-18T14:37:02Z INFO EphemeralRunner Creating new EphemeralRunner pod. {"version": "0.12.0", "ephemeralrunner": {"name":"redacted-ubuntu-blue-pthw7-runner-r94hn","namespace":"gha-runners"}}
2025-06-18T14:37:02Z INFO EphemeralRunner Creating new pod for ephemeral runner {"version": "0.12.0", "ephemeralrunner": {"name":"redacted-ubuntu-blue-pthw7-runner-r94hn","namespace":"gha-runners"}}
2025-06-18T14:37:02Z INFO EphemeralRunner Created new pod spec for ephemeral runner {"version": "0.12.0", "ephemeralrunner": {"name":"redacted-ubuntu-blue-pthw7-runner-r94hn","namespace":"gha-runners"}}
2025-06-18T14:37:02Z INFO EphemeralRunner Created ephemeral runner pod {"version": "0.12.0", "ephemeralrunner": {"name":"redacted-ubuntu-blue-pthw7-runner-r94hn","namespace":"gha-runners"}, "runnerScaleSetId": 321, "runnerName": "redacted-ubuntu-blue-pthw7-runner-r94hn", "runnerId": 10485340, "configUrl": "https://github.com/redacted", "podName": "redacted-ubuntu-blue-pthw7-runner-r94hn"}
2025-06-18T14:37:02Z INFO EphemeralRunner Ephemeral runner container is still running {"version": "0.12.0", "ephemeralrunner": {"name":"redacted-ubuntu-blue-pthw7-runner-r94hn","namespace":"gha-runners"}}
2025-06-18T14:37:02Z INFO EphemeralRunner Updating ephemeral runner status {"version": "0.12.0", "ephemeralrunner": {"name":"redacted-ubuntu-blue-pthw7-runner-r94hn","namespace":"gha-runners"}, "statusPhase": "Pending", "statusReason": "", "statusMessage": "", "ready": false}
2025-06-18T14:37:02Z INFO EphemeralRunner Updated ephemeral runner status {"version": "0.12.0", "ephemeralrunner": {"name":"redacted-ubuntu-blue-pthw7-runner-r94hn","namespace":"gha-runners"}}
2025-06-18T14:37:02Z INFO EphemeralRunner Ephemeral runner container is still running {"version": "0.12.0", "ephemeralrunner": {"name":"redacted-ubuntu-blue-pthw7-runner-r94hn","namespace":"gha-runners"}}
2025-06-18T14:37:03Z INFO EphemeralRunner Ephemeral runner container is still running {"version": "0.12.0", "ephemeralrunner": {"name":"redacted-ubuntu-blue-pthw7-runner-r94hn","namespace":"gha-runners"}}
2025-06-18T14:37:03Z INFO EphemeralRunner Updating ephemeral runner status {"version": "0.12.0", "ephemeralrunner": {"name":"redacted-ubuntu-blue-pthw7-runner-r94hn","namespace":"gha-runners"}, "statusPhase": "Running", "statusReason": "", "statusMessage": "", "ready": true}
2025-06-18T14:37:03Z INFO EphemeralRunner Updated ephemeral runner status {"version": "0.12.0", "ephemeralrunner": {"name":"redacted-ubuntu-blue-pthw7-runner-r94hn","namespace":"gha-runners"}}
2025-06-18T14:37:03Z INFO EphemeralRunner Ephemeral runner container is still running {"version": "0.12.0", "ephemeralrunner": {"name":"redacted-ubuntu-blue-pthw7-runner-r94hn","namespace":"gha-runners"}}
2025-06-18T14:38:09Z INFO EphemeralRunner Ephemeral runner container is still running {"version": "0.12.0", "ephemeralrunner": {"name":"redacted-ubuntu-blue-pthw7-runner-r94hn","namespace":"gha-runners"}}
2025-06-18T14:38:11Z INFO EphemeralRunner Checking if runner exists in GitHub service {"version": "0.12.0", "ephemeralrunner": {"name":"redacted-ubuntu-blue-pthw7-runner-r94hn","namespace":"gha-runners"}, "runnerId": 10485340}
2025-06-18T14:38:11Z INFO EphemeralRunner Runner exists in GitHub service {"version": "0.12.0", "ephemeralrunner": {"name":"redacted-ubuntu-blue-pthw7-runner-r94hn","namespace":"gha-runners"}, "runnerId": 10485340}
2025-06-18T14:38:11Z INFO EphemeralRunner Ephemeral runner pod has finished, but the runner still exists in the service. Deleting the pod to restart it. {"version": "0.12.0", "ephemeralrunner": {"name":"redacted-ubuntu-blue-pthw7-runner-r94hn","namespace":"gha-runners"}}
2025-06-18T14:38:11Z INFO EphemeralRunner Deleting the ephemeral runner pod {"version": "0.12.0", "ephemeralrunner": {"name":"redacted-ubuntu-blue-pthw7-runner-r94hn","namespace":"gha-runners"}, "podId": "d0689dc0-2c50-4f0d-898f-6502ec13d61e"}
2025-06-18T14:38:11Z INFO EphemeralRunner Updating ephemeral runner status to track the failure count {"version": "0.12.0", "ephemeralrunner": {"name":"redacted-ubuntu-blue-pthw7-runner-r94hn","namespace":"gha-runners"}}
2025-06-18T14:38:11Z INFO EphemeralRunner EphemeralRunner pod is deleted and status is updated with failure count {"version": "0.12.0", "ephemeralrunner": {"name":"redacted-ubuntu-blue-pthw7-runner-r94hn","namespace":"gha-runners"}}
2025-06-18T14:38:11Z INFO EphemeralRunner Backing off the next reconciliation due to failure {"version": "0.12.0", "ephemeralrunner": {"name":"redacted-ubuntu-blue-pthw7-runner-r94hn","namespace":"gha-runners"}, "lastFailure": "2025-06-18 14:38:11 +0000 UTC", "nextReconciliation": "2025-06-18T14:38:16Z", "requeueAfter": "4.360690333s"}
2025-06-18T14:38:12Z INFO EphemeralRunner Backing off the next reconciliation due to failure {"version": "0.12.0", "ephemeralrunner": {"name":"redacted-ubuntu-blue-pthw7-runner-r94hn","namespace":"gha-runners"}, "lastFailure": "2025-06-18 14:38:11 +0000 UTC", "nextReconciliation": "2025-06-18T14:38:16Z", "requeueAfter": "3.312928408s"}
2025-06-18T14:38:13Z INFO EphemeralRunner Backing off the next reconciliation due to failure {"version": "0.12.0", "ephemeralrunner": {"name":"redacted-ubuntu-blue-pthw7-runner-r94hn","namespace":"gha-runners"}, "lastFailure": "2025-06-18 14:38:11 +0000 UTC", "nextReconciliation": "2025-06-18T14:38:16Z", "requeueAfter": "2.396345065s"}
2025-06-18T14:38:13Z INFO EphemeralRunner Backing off the next reconciliation due to failure {"version": "0.12.0", "ephemeralrunner": {"name":"redacted-ubuntu-blue-pthw7-runner-r94hn","namespace":"gha-runners"}, "lastFailure": "2025-06-18 14:38:11 +0000 UTC", "nextReconciliation": "2025-06-18T14:38:16Z", "requeueAfter": "2.388365807s"}
One interesting thing about the 0.12.0 logs is there is no line like
2025-06-14T12:03:54Z INFO EphemeralRunnerSet Deleting finished ephemeral runner {"version": "0.11.0", "ephemeralrunnerset": {"name":"redacted-ubuntu-blue-r8m4f","namespace":"gha-runners"}, "name": "redacted-ubuntu-blue-r8m4f-runner-ljvb4"}
Both of these runners finished successfully, per the AutoscalingListener pod:
0.11.0
2025-06-14T12:03:53Z INFO listener-app.listener Job completed message received. {"RequestId": 0, "Result": "succeeded", "RunnerId": 10419516, "RunnerName": "redacted-ubuntu-blue-r8m4f-runner-ljvb4"}
0.12.0
2025-06-18T14:38:11Z INFO listener-app.listener Job completed message received. {"RequestId": 0, "Result": "succeeded", "RunnerId": 10485340, "RunnerName": "redacted-ubuntu-blue-pthw7-runner-r94hn"}
We are running into the same issue after upgrading to 0.12.0. In the previous version, there would be Failed ephemeral runners that wouldn't get cleaned up but at least it was clear what the resolution should be or what to monitor.
With the new release, it even takes that away. The stats for autoscalingrunnersets get to an invalid state and we only managed to figure out based on the age of the ephemeralrunners. The runner pods do not exist.
We're experiencing a similar issue where Ephemeral Runners get stuck in the "Running" state without corresponding runner pods. To address this, I created a bash script that runs via CronJob to terminate such orphaned runners: https://gist.github.com/dx0x58/b2ae1982b5e9589677c1ddd9e3a6c24a
check-phantom-runners.sh --fix --namespace {{ arc_runners_namespace }} --age 150
The --age parameter specifies the threshold (in seconds) for the creation timestamp of ephemeral runners that should be terminated.
We are experiencing the same issue. Much like @muawiakh, in the previous version, we had issues with ephemeralRunners ending up in the "Failed" state, now they are ending up in the "Running" state, but without any corresponding pods.
I can corroborate this issue, was going to open it myself but didn't get a chance to yet. The EphemeralRunners exist indefinitely with a
.status.phase: Running. I'll share the cronjob setup I added to buy me time to continue investigating without blocking our users' job startups: zombie-runner-cleanup.yamlThis issue was not happening to our runners in 0.11.0. And it is not intermittent, in the sense that the overall issue doesn't come and go by the day, it is always affecting some percentage of our runners, but it IS intermittent in the sense that it seems to happen randomly to our jobs, with no discernible difference. It seems to affect roughly 2-20 jobs an hour for us. If we don't clean them up, the controller seems to be counting those as part of the current scale metric so it doesn't think it needs more runners added to meet demand, thus the increasing length of job queue.
Thanks for the script, this appears to be working for us.
Hey everyone, I just wanted to let you all know that we identified the problem.
The check to see if the runner exists within the service can sometimes return a false positive result. Even though this will be fixed on the back-end, the PR #4142 should also resolve the issue, since we don't need this check.
As long as the runner image is properly built (i.e. the entrypoint will return the exit code of the runner), the check we are doing right now is not necessary. Therefore, we will remove it.
We are running into a similar issue like @kyrylomiro mentioned. But for me, the pod is also running yet the workflow is completed and this is causing high queue time for workflows.
Controller Version: 0.11.0
Status:
Failures:
0edacef2-b018-45e3-a863-d8ff765e6e63: true
Job Repository Name: xxxx
Job Workflow Ref: xxxx.yml@refs/pull/763/merge
Phase: Running
Ready: true
Runner Id: 1207620
We are running into a similar issue like @kyrylomiro mentioned. But for me, the pod is also running yet the workflow is completed and this is causing high queue time for workflows.
Controller Version: 0.11.0
Status: Failures: 0edacef2-b018-45e3-a863-d8ff765e6e63: true Job Repository Name: xxxx Job Workflow Ref: xxxx.yml@refs/pull/763/merge Phase: Running Ready: true Runner Id: 1207620
@shivansh-ptr that sounds like a separate type of problem and belongs in a separate issue
Hey @shivansh-ptr,
That is exactly the root of the problem. After the workflow is done, we check if the runner exists. Since it does (in this case), we mark the ephemeral runner as failed, which creates this entry in Failures. It would then start the crash loop (since at that point, the runner registration is invalid), and would cause the ephemeral runner to reach the failed state.
Hi, I'm using both Controller Version and gha-runner-scale-set version 0.12.0 and experiencing something very similar but with the difference that my workflow doesn't get to run even once. From previous comments my understanding is that the issue is for subsequent runs after at least one successful execution. In my case the EphemeralRunner stays in Pending status and if I describe it I get the same Failure as shown above: "status": { "failures": {"<uuid>": "<timestamp>"}}
Some useful excerpts of the controller's log are:
EphemeralRunner Backing off the next reconciliation due to failure {"version": "0.12.0", "ephemeralrunner": {"name":"eks-cluster-dev-9d5w4-runner-2qjpx","namespace":"github-actions-runners"}, "lastFailure": "2025-06-24 14:49:09 +0000 UTC", "nextReconciliation": "2025-06-24T14:49:14Z", "requeueAfter": "4.490815905s"}
......
EphemeralRunner EphemeralRunner pod is deleted and status is updated with failure count {"version": "0.12.0", "ephemeralrunner": {"name":"eks-cluster-dev-9d5w4-runner-2qjpx","namespace":"github-actions-runners"}}
......
EphemeralRunner Updating ephemeral runner status to track the failure count {"version": "0.12.0", "ephemeralrunner": {"name":"eks-cluster-dev-9d5w4-runner-2qjpx","namespace":"github-actions-runners"}}
......
EphemeralRunner Ephemeral runner pod has finished, but the runner still exists in the service. Deleting the pod to restart it. {"version": "0.12.0", "ephemeralrunner": {"name":"eks-cluster-dev-9d5w4-runner-2qjpx","namespace":"github-actions-runners"}}
......
@nikola-jokic I wonder if #4142 fixes the issue for my situation also, because from what was said before I believe it might only fix the scenario for certain runners (i.e. Runner Id N where N>1, but not for N=1)
Thanks!
Any timeline on when the fix for this will be rolled out as part of a new release?
Hey everyone, just to let you all know, we are targeting Monday for the next patch release that will include this fix.
FYI a similar bug we still have with 0.12.0 that others might be experiencing but isn't quite the same (our stuck runners stay forever in Running state with failures in status) - https://github.com/actions/actions-runner-controller/issues/4148
Hey everyone, we decided to publish a new release today! The 0.12.1 is out! 😄
@nikola-jokic Hi, I still face this issue even on version 0.12.1
Hi @nikola-jokic I still face the same issue on version 0.12.1 as mentioned here. The workflow is completed but the pod is stuck in the running state and the entry in failure is created
Can confirm that I've experienced the same issue on 0.12.1.
[RUNNER 2025-09-30 23:53:10Z ERR GitHubActionsService] POST request to https://run-actions-3-azure-eastus.actions.githubusercontent.com/1/acquirejob failed. HTTP Status: Conflict
[RUNNER 2025-09-30 23:53:10Z INFO Runner] Skipping message Job. Job message already acquired 'bb518439-0dcf-5bcf-b669-ee40b36f9a00'. job assignment is invalid: MissingKey