actions-runner-controller
actions-runner-controller copied to clipboard
RunnerDeployment pods in NotReady state after GHA workflow completion
Checks
- [X] I've already read https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners-with-actions-runner-controller/troubleshooting-actions-runner-controller-errors and I'm sure my issue is not covered in the troubleshooting guide.
- [X] I am using charts that are officially provided
Controller Version
0.27.6
Deployment Method
Helm
Checks
- [X] This isn't a question or user support case (For Q&A and community support, go to Discussions).
- [X] I've read the Changelog before submitting this issue and I'm sure it's not due to any recently-introduced backward-incompatible changes
To Reproduce
1. Runners are deployed using `RunnerDeployment` & `HorizontalRunnerAutoscaler`.
2. Runners pick up and execute workflow.
2. The workflow finishes successfully.
3. Pods of the executions get stuck on `NotReady` state.
Describe the bug
It's the same behavior explained on https://github.com/actions/actions-runner-controller/issues/1515. After workflow completion the runners are in Running
state, but pods are staying in NotReady
. I'm using RunnerDeployment
resource, Helm chart version 0.23.7 and ARC 0.27.6 The CRDs were also upgraded.
kubectl get pods | grep -i notready
runner-deployment-ksfjw-c5nnd 1/2 NotReady 0 4h11m
runner-deployment-ksfjw-ffzbm 1/2 NotReady 0 3h45m
runner-deployment-ksfjw-hc7b5 1/2 NotReady 0 4h11m
runner-deployment-ksfjw-khf6b 1/2 NotReady 0 4h11m
runner-deployment-ksfjw-rnws2 1/2 NotReady 0 3h45m
runner-deployment-ksfjw-w7bln 1/2 NotReady 0 3h51m
Describe the expected behavior
Pods should be terminated after execution.
Additional Context
Pod `yaml` output example:
apiVersion: v1
kind: Pod
metadata:
annotations:
actions-runner-controller/token-expires-at: "2024-01-29T13:29:33-06:00"
actions-runner/id: "1453"
kubernetes.io/psp: privileged
sync-time: "2024-01-29T18:29:33Z"
creationTimestamp: "2024-01-29T18:29:33Z"
finalizers:
- actions.summerwind.dev/runner-pod
labels:
actions-runner: ""
actions-runner-controller/inject-registration-token: "true"
pod-template-hash: f8546db97
runner-deployment-name: runner-deployment
runner-template-hash: f7674645d
name: runner-deployment-ksfjw-c5nnd
ownerReferences:
- apiVersion: actions.summerwind.dev/v1alpha1
blockOwnerDeletion: true
controller: true
kind: Runner
name: runner-deployment-ksfjw-c5nnd
uid: 800c2f97-9ce8-4e14-8733-734254047e58
resourceVersion: "434077453"
uid: 273eba64-3fa3-4358-b3a2-b770ae4c8ab6
spec:
containers:
- env:
- name: RUNNER_ORG
- name: RUNNER_REPO
value: my_repo
- name: RUNNER_ENTERPRISE
- name: RUNNER_LABELS
value: label_1,label_2
- name: RUNNER_GROUP
- name: DOCKER_ENABLED
value: "true"
- name: DOCKERD_IN_RUNNER
value: "false"
- name: GITHUB_URL
value: https://github.com/
- name: RUNNER_WORKDIR
value: /runner/_work
- name: RUNNER_EPHEMERAL
value: "true"
- name: RUNNER_STATUS_UPDATE_HOOK
value: "false"
- name: GITHUB_ACTIONS_RUNNER_EXTRA_USER_AGENT
value: actions-runner-controller/v0.27.6
- name: DOCKER_HOST
value: unix:///run/docker.sock
- name: RUNNER_NAME
value: runner-deployment-ksfjw-c5nnd
- name: RUNNER_TOKEN
value: token
image: summerwind/actions-runner:latest
imagePullPolicy: IfNotPresent
name: runner
resources: {}
securityContext: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /runner
name: runner
- mountPath: /runner/_work
name: work
- mountPath: /run
name: var-run
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: default-token-rnt9m
readOnly: true
image: docker:dind
imagePullPolicy: IfNotPresent
lifecycle:
preStop:
exec:
command:
- /bin/sh
- -c
- timeout "${RUNNER_GRACEFUL_STOP_TIMEOUT:-15}" /bin/sh -c "echo 'Prestop
hook started'; while [ -f /runner/.runner ]; do sleep 1; done; echo 'Waiting
for dockerd to start'; while ! pgrep -x dockerd; do sleep 1; done; echo
'Prestop hook stopped'" >/proc/1/fd/1 2>&1
name: docker
resources: {}
securityContext:
privileged: true
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /etc/docker/certs.d
name: certs
- mountPath: /runner
name: runner
- mountPath: /run
name: var-run
- mountPath: /runner/_work
name: work
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: default-token-rnt9m
readOnly: true
dnsPolicy: ClusterFirst
enableServiceLinks: true
nodeName: aia-sdp-gnr-689773
nodeSelector:
node-type: gnr
preemptionPolicy: PreemptLowerPriority
priority: 0
restartPolicy: Never
schedulerName: default-scheduler
securityContext: {}
serviceAccount: default
serviceAccountName: default
terminationGracePeriodSeconds: 30
tolerations:
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300
volumes:
- emptyDir: {}
name: runner
- emptyDir: {}
name: work
- emptyDir:
medium: Memory
sizeLimit: 1M
name: var-run
- emptyDir: {}
name: certs
- name: default-token-rnt9m
projected:
defaultMode: 420
status:
conditions:
- lastProbeTime: null
lastTransitionTime: "2024-01-29T18:29:33Z"
status: "True"
type: Initialized
- lastProbeTime: null
lastTransitionTime: "2024-01-29T18:49:22Z"
message: 'containers with unready status: [runner]'
reason: ContainersNotReady
status: "False"
type: Ready
- lastProbeTime: null
lastTransitionTime: "2024-01-29T18:49:22Z"
message: 'containers with unready status: [runner]'
reason: ContainersNotReady
status: "False"
type: ContainersReady
- lastProbeTime: null
lastTransitionTime: "2024-01-29T18:29:33Z"
status: "True"
type: PodScheduled
containerStatuses:
- containerID: docker://49fbad503a4e71112578e4d5b5d16e2c4b095145d11548cecc6a755f63218b51
image: docker:dind
imageID: docker-pullable://docker@sha256:1dfc375736e448806602211e09a9b1390eb110548dcb839eef374da357ca5f5d
lastState: {}
name: docker
ready: true
restartCount: 0
started: true
state:
running:
startedAt: "2024-01-29T18:29:35Z"
- containerID: docker://5170c36d647c50340f579f2fe19afb5b1f80e27f6020686dcd588f89c9097a34
image: summerwind/actions-runner:latest
imageID: docker-pullable://summerwind/actions-runner@sha256:4b0eb7ec68aec459ce5d69585675f40a2dd13eb69646fa786ab9809aaf33b75e
lastState: {}
name: runner
ready: false
restartCount: 0
started: false
state:
terminated:
containerID: docker://5170c36d647c50340f579f2fe19afb5b1f80e27f6020686dcd588f89c9097a34
exitCode: 0
finishedAt: "2024-01-29T18:49:21Z"
reason: Completed
startedAt: "2024-01-29T18:29:35Z"
hostIP: 10.23.152.242
phase: Running
podIP: 100.64.6.127
podIPs:
- ip: 100.64.6.127
qosClass: BestEffort
startTime: "2024-01-29T18:29:33Z"
Controller Logs
kubectl get pods -n actions-runner-system
NAME READY STATUS RESTARTS AGE
actions-runner-controller-74988b64f9-st5rz 2/2 Running 4 (3d17h ago) 3d21h
Runner Pod Logs
kubectl logs runner-deployment-ksfjw-c5nnd
Defaulted container "runner" out of: runner, docker
2024-01-29 18:29:35.235 NOTICE --- Runner init started with pid 7
2024-01-29 18:29:35.245 DEBUG --- Github endpoint URL https://github.com/
2024-01-29 18:29:38.97 DEBUG --- Passing --ephemeral to config.sh to enable the ephemeral runner.
2024-01-29 18:29:38.102 DEBUG --- Configuring the runner.
--------------------------------------------------------------------------------
| ____ _ _ _ _ _ _ _ _ |
| / ___(_) |_| | | |_ _| |__ / \ ___| |_(_) ___ _ __ ___ |
| | | _| | __| |_| | | | | '_ \ / _ \ / __| __| |/ _ \| '_ \/ __| |
| | |_| | | |_| _ | |_| | |_) | / ___ \ (__| |_| | (_) | | | \__ \ |
| \____|_|\__|_| |_|\__,_|_.__/ /_/ \_\___|\__|_|\___/|_| |_|___/ |
| |
| Self-hosted runner registration |
| |
--------------------------------------------------------------------------------
# Authentication
√ Connected to GitHub
# Runner Registration
√ Runner successfully added
√ Runner connection is good
# Runner settings
√ Settings Saved.
2024-01-29 18:29:43.981 DEBUG --- Runner successfully configured.
{
"agentId": 1453,
"agentName": "runner-deployment-any-ksfjw-c5nnd",
"poolId": 1,
"poolName": "Default",
"ephemeral": true,
"serverUrl": "https://pipelinesghubeus26.actions.githubusercontent.com/4k72J58r6zbOt2ltvDDZQpRxMDTHuVQKd5NYXjBRmfUGlMtUVy/",
"gitHubUrl": "https://github.com/my_repo/runner-deployment",
"workFolder": "/runner/_work"
2024-01-29 18:29:43.993 DEBUG --- Docker enabled runner detected and Docker daemon wait is enabled
2024-01-29 18:29:43.997 DEBUG --- Waiting until Docker is available or the timeout of 120 seconds is reached
}CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
2024-01-29 18:29:44.43 NOTICE --- WARNING LATEST TAG HAS BEEN DEPRECATED. SEE GITHUB ISSUE FOR DETAILS:
2024-01-29 18:29:44.46 NOTICE --- https://github.com/actions/actions-runner-controller/issues/2056
√ Connected to GitHub
Current runner version: '2.311.0'
2024-01-29 18:29:48Z: Listening for Jobs
Runner update in progress, do not shutdown runner.
Downloading 2.312.0 runner
Waiting for current job finish running.
Generate and execute update script.
Runner will exit shortly for update, should be back online within 10 seconds.
Runner update process finished.
Runner listener exit because of updating, re-launch runner after successful update
Update finished successfully.
Restarting runner...
√ Connected to GitHub
Current runner version: '2.312.0'
2024-01-29 18:30:27Z: Listening for Jobs
2024-01-29 18:30:29Z: Running job: my-job
2024-01-29 18:49:20Z: Job my-job completed with result: Succeeded
√ Removed .credentials
√ Removed .runner
Runner listener exit with 0 return code, stop the service, no retry needed.
Exiting runner...
2024-01-29 18:49:21.336 NOTICE --- Runner init exited. Exiting this process with code 0 so that the container and the pod is GC'ed Kubernetes soon.
Hello! Thank you for filing an issue.
The maintainers will triage your issue shortly.
In the meantime, please take a look at the troubleshooting guide for bug reports.
If this is a feature request, please review our contribution guidelines.