actions-runner-controller
actions-runner-controller copied to clipboard
Runner not cleaned up after completion
Checks
- [X] I've already read https://github.com/actions-runner-controller/actions-runner-controller/blob/master/TROUBLESHOOTING.md and I'm sure my issue is not covered in the troubleshooting guide.
Controller Version
v0.26.0
Helm Chart Version
0.21.0
CertManager Version
v1.9.1
Deployment Method
Helm
cert-manager installation
resource "helm_release" "cert-manager" { name = "cert-manager" repository = "https://charts.jetstack.io" chart = "cert-manager" version = "1.9.1" create_namespace = true namespace = "cert-manager"
set { name = "installCRDs" value = "true" } }
Checks
- [X] This isn't a question or user support case (For Q&A and community support, go to Discussions. It might also be a good idea to contract with any of contributors and maintainers if your business is so critical and therefore you need priority support
- [X] I've read releasenotes before submitting this issue and I'm sure it's not due to any recently-introduced backward-incompatible changes
- [X] My actions-runner-controller version (v0.x.y) does support the feature
- [X] I've already upgraded ARC (including the CRDs, see charts/actions-runner-controller/docs/UPGRADING.md for details) to the latest and it didn't fix the issue
Resource Definitions
apiVersion: actions.summerwind.dev/v1alpha1
kind: Runner
metadata:
creationTimestamp: "2022-09-22T13:59:54Z"
finalizers:
- runner.actions.summerwind.dev
generation: 1
name: eks-runner-terraform-vsftp-security
namespace: actions-runner-system
resourceVersion: "1143184"
uid: aa419171-bef9-4925-9c1c-e43f4d3296a3
spec:
dockerdContainerResources: {}
image: ""
labels:
- eks_runner
- livetest
- live
repository: org/repo
resources: {}
serviceAccountName: eks-runner-livetest-live
status:
phase: Running
ready: false
registration:
expiresAt: "2022-09-22T14:59:54Z"
labels:
- eks_runner
- livetest
- live
repository: org/repo
token: XXXX
To Reproduce
1. Create the runner
2. Schedule a job
3. Observe finished runner but stuck POD
Describe the bug
A runner pod is stuck after the job is done and runner container exits, but the pod continues to run in NotReady state
Describe the expected behavior
The runner pod is terminated, and a new one starts
Controller Logs
2022-09-22T14:01:24Z DEBUG actions-runner-controller.runner Runner appears to have been registered and running. {"runner": "actions-runner-system/eks-runner-terraform-vsftp-security", "podCreationTimestamp": "2022-09-22 13:59:54 +0000 UTC"}
Runner Pod Logs
2022-09-22 13:59:56.763 DEBUG --- Github endpoint URL https://github.com/
2022-09-22 13:59:57.212 DEBUG --- Passing --ephemeral to config.sh to enable the ephemeral runner.
2022-09-22 13:59:57.215 DEBUG --- Configuring the runner.
--------------------------------------------------------------------------------
| ____ _ _ _ _ _ _ _ _ |
| / ___(_) |_| | | |_ _| |__ / \ ___| |_(_) ___ _ __ ___ |
| | | _| | __| |_| | | | | '_ \ / _ \ / __| __| |/ _ \| '_ \/ __| |
| | |_| | | |_| _ | |_| | |_) | / ___ \ (__| |_| | (_) | | | \__ \ |
| \____|_|\__|_| |_|\__,_|_.__/ /_/ \_\___|\__|_|\___/|_| |_|___/ |
| |
| Self-hosted runner registration |
| |
--------------------------------------------------------------------------------
# Authentication
√ Connected to GitHub
# Runner Registration
√ Runner successfully added
√ Runner connection is good
# Runner settings
√ Settings Saved.
2022-09-22 14:00:01.667 DEBUG --- Runner successfully configured.
{
"agentId": 142,
"agentName": "eks-runner-terraform-vsftp-security",
"poolId": 1,
"poolName": "Default",
"ephemeral": true,
"serverUrl": "https://pipelines.actions.githubusercontent.com/somelongid",
"gitHubUrl": "https://github.com/org/repo",
"workFolder": "/runner/_work"
2022-09-22 14:00:01.677 DEBUG --- Docker enabled runner detected and Docker daemon wait is enabled
2022-09-22 14:00:01.678 DEBUG --- Waiting until Docker is available or the timeout is reached
}CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
√ Connected to GitHub
Current runner version: '2.296.2'
2022-09-22 14:00:03Z: Listening for Jobs
2022-09-22 14:00:35Z: Running job: Check PR
2022-09-22 14:01:22Z: Job Check PR completed with result: Succeeded
√ Removed .credentials
√ Removed .runner
Runner listener exit with 0 return code, stop the service, no retry needed.
Exiting runner...
Generating RSA private key, 4096 bit long modulus (2 primes)
........++++
........................++++
e is 65537 (0x010001)
Generating RSA private key, 4096 bit long modulus (2 primes)
...................................................................++++
.............................++++
e is 65537 (0x010001)
Signature ok
subject=CN = docker:dind server
Getting CA Private Key
/certs/server/cert.pem: OK
Generating RSA private key, 4096 bit long modulus (2 primes)
..............................................++++
.....................................................................................++++
e is 65537 (0x010001)
Signature ok
subject=CN = docker:dind client
Getting CA Private Key
/certs/client/cert.pem: OK
time="2022-09-22T14:00:00.250350969Z" level=info msg="Starting up"
time="2022-09-22T14:00:00.252608955Z" level=warning msg="could not change group /var/run/docker.sock to docker: group docker not found"
time="2022-09-22T14:00:00.253606299Z" level=info msg="libcontainerd: started new containerd process" pid=63
time="2022-09-22T14:00:00.253642630Z" level=info msg="parsed scheme: \"unix\"" module=grpc
time="2022-09-22T14:00:00.253659650Z" level=info msg="scheme \"unix\" not registered, fallback to default scheme" module=grpc
time="2022-09-22T14:00:00.253701962Z" level=info msg="ccResolverWrapper: sending update to cc: {[{unix:///var/run/docker/containerd/containerd.sock <nil> 0 <nil>}] <nil> <nil>}" module=grpc
time="2022-09-22T14:00:00.253716374Z" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
time="2022-09-22T14:00:00Z" level=warning msg="containerd config version `1` has been deprecated and will be removed in containerd v2.0, please switch to version `2`, see https://github.com/containerd/containerd/blob/main/docs/PLUGINS.md#version-header"
time="2022-09-22T14:00:00.274983011Z" level=info msg="starting containerd" revision=9cd3357b7fd7218e4aec3eae239db1f68a5a6ec6 version=v1.6.8
time="2022-09-22T14:00:00.308012401Z" level=info msg="loading plugin \"io.containerd.content.v1.content\"..." type=io.containerd.content.v1
time="2022-09-22T14:00:00.308683945Z" level=info msg="loading plugin \"io.containerd.snapshotter.v1.aufs\"..." type=io.containerd.snapshotter.v1
time="2022-09-22T14:00:00.324630578Z" level=info msg="skip loading plugin \"io.containerd.snapshotter.v1.aufs\"..." error="aufs is not supported (modprobe aufs failed: exit status 1 \"ip: can't find device 'aufs'\\nmodprobe: can't change directory to '/lib/modules': No such file or directory\\n\"): skip plugin" type=io.containerd.snapshotter.v1
time="2022-09-22T14:00:00.324676835Z" level=info msg="loading plugin \"io.containerd.snapshotter.v1.btrfs\"..." type=io.containerd.snapshotter.v1
time="2022-09-22T14:00:00.324969917Z" level=info msg="skip loading plugin \"io.containerd.snapshotter.v1.btrfs\"..." error="path /var/lib/docker/containerd/daemon/io.containerd.snapshotter.v1.btrfs (xfs) must be a btrfs filesystem to be used with the btrfs snapshotter: skip plugin" type=io.containerd.snapshotter.v1
time="2022-09-22T14:00:00.325002973Z" level=info msg="loading plugin \"io.containerd.snapshotter.v1.devmapper\"..." type=io.containerd.snapshotter.v1
time="2022-09-22T14:00:00.325024617Z" level=warning msg="failed to load plugin io.containerd.snapshotter.v1.devmapper" error="devmapper not configured"
time="2022-09-22T14:00:00.325075702Z" level=info msg="loading plugin \"io.containerd.snapshotter.v1.native\"..." type=io.containerd.snapshotter.v1
time="2022-09-22T14:00:00.325160792Z" level=info msg="loading plugin \"io.containerd.snapshotter.v1.overlayfs\"..." type=io.containerd.snapshotter.v1
time="2022-09-22T14:00:00.325436978Z" level=info msg="loading plugin \"io.containerd.snapshotter.v1.zfs\"..." type=io.containerd.snapshotter.v1
time="2022-09-22T14:00:00.325660384Z" level=info msg="skip loading plugin \"io.containerd.snapshotter.v1.zfs\"..." error="path /var/lib/docker/containerd/daemon/io.containerd.snapshotter.v1.zfs must be a zfs filesystem to be used with the zfs snapshotter: skip plugin" type=io.containerd.snapshotter.v1
time="2022-09-22T14:00:00.325691263Z" level=info msg="loading plugin \"io.containerd.metadata.v1.bolt\"..." type=io.containerd.metadata.v1
time="2022-09-22T14:00:00.325744240Z" level=warning msg="could not use snapshotter devmapper in metadata plugin" error="devmapper not configured"
time="2022-09-22T14:00:00.325763257Z" level=info msg="metadata content store policy set" policy=shared
time="2022-09-22T14:00:00.331575052Z" level=info msg="loading plugin \"io.containerd.differ.v1.walking\"..." type=io.containerd.differ.v1
time="2022-09-22T14:00:00.331608182Z" level=info msg="loading plugin \"io.containerd.event.v1.exchange\"..." type=io.containerd.event.v1
time="2022-09-22T14:00:00.331623475Z" level=info msg="loading plugin \"io.containerd.gc.v1.scheduler\"..." type=io.containerd.gc.v1
time="2022-09-22T14:00:00.331675318Z" level=info msg="loading plugin \"io.containerd.service.v1.introspection-service\"..." type=io.containerd.service.v1
time="2022-09-22T14:00:00.331706018Z" level=info msg="loading plugin \"io.containerd.service.v1.containers-service\"..." type=io.containerd.service.v1
time="2022-09-22T14:00:00.331747642Z" level=info msg="loading plugin \"io.containerd.service.v1.content-service\"..." type=io.containerd.service.v1
time="2022-09-22T14:00:00.331769135Z" level=info msg="loading plugin \"io.containerd.service.v1.diff-service\"..." type=io.containerd.service.v1
time="2022-09-22T14:00:00.332070527Z" level=info msg="loading plugin \"io.containerd.service.v1.images-service\"..." type=io.containerd.service.v1
time="2022-09-22T14:00:00.332090515Z" level=info msg="loading plugin \"io.containerd.service.v1.leases-service\"..." type=io.containerd.service.v1
time="2022-09-22T14:00:00.332110778Z" level=info msg="loading plugin \"io.containerd.service.v1.namespaces-service\"..." type=io.containerd.service.v1
time="2022-09-22T14:00:00.332132322Z" level=info msg="loading plugin \"io.containerd.service.v1.snapshots-service\"..." type=io.containerd.service.v1
time="2022-09-22T14:00:00.332151702Z" level=info msg="loading plugin \"io.containerd.runtime.v1.linux\"..." type=io.containerd.runtime.v1
time="2022-09-22T14:00:00.332293526Z" level=info msg="loading plugin \"io.containerd.runtime.v2.task\"..." type=io.containerd.runtime.v2
time="2022-09-22T14:00:00.332425401Z" level=info msg="loading plugin \"io.containerd.monitor.v1.cgroups\"..." type=io.containerd.monitor.v1
time="2022-09-22T14:00:00.332781793Z" level=info msg="loading plugin \"io.containerd.service.v1.tasks-service\"..." type=io.containerd.service.v1
time="2022-09-22T14:00:00.332817372Z" level=info msg="loading plugin \"io.containerd.grpc.v1.introspection\"..." type=io.containerd.grpc.v1
time="2022-09-22T14:00:00.332835663Z" level=info msg="loading plugin \"io.containerd.internal.v1.restart\"..." type=io.containerd.internal.v1
time="2022-09-22T14:00:00.332879154Z" level=info msg="loading plugin \"io.containerd.grpc.v1.containers\"..." type=io.containerd.grpc.v1
time="2022-09-22T14:00:00.332901754Z" level=info msg="loading plugin \"io.containerd.grpc.v1.content\"..." type=io.containerd.grpc.v1
time="2022-09-22T14:00:00.332934445Z" level=info msg="loading plugin \"io.containerd.grpc.v1.diff\"..." type=io.containerd.grpc.v1
time="2022-09-22T14:00:00.332949122Z" level=info msg="loading plugin \"io.containerd.grpc.v1.events\"..." type=io.containerd.grpc.v1
time="2022-09-22T14:00:00.332964617Z" level=info msg="loading plugin \"io.containerd.grpc.v1.healthcheck\"..." type=io.containerd.grpc.v1
time="2022-09-22T14:00:00.332985504Z" level=info msg="loading plugin \"io.containerd.grpc.v1.images\"..." type=io.containerd.grpc.v1
time="2022-09-22T14:00:00.333000145Z" level=info msg="loading plugin \"io.containerd.grpc.v1.leases\"..." type=io.containerd.grpc.v1
time="2022-09-22T14:00:00.333020447Z" level=info msg="loading plugin \"io.containerd.grpc.v1.namespaces\"..." type=io.containerd.grpc.v1
time="2022-09-22T14:00:00.333042101Z" level=info msg="loading plugin \"io.containerd.internal.v1.opt\"..." type=io.containerd.internal.v1
time="2022-09-22T14:00:00.333211925Z" level=info msg="loading plugin \"io.containerd.grpc.v1.snapshots\"..." type=io.containerd.grpc.v1
time="2022-09-22T14:00:00.333238060Z" level=info msg="loading plugin \"io.containerd.grpc.v1.tasks\"..." type=io.containerd.grpc.v1
time="2022-09-22T14:00:00.333255541Z" level=info msg="loading plugin \"io.containerd.grpc.v1.version\"..." type=io.containerd.grpc.v1
time="2022-09-22T14:00:00.333275067Z" level=info msg="loading plugin \"io.containerd.tracing.processor.v1.otlp\"..." type=io.containerd.tracing.processor.v1
time="2022-09-22T14:00:00.333299976Z" level=info msg="skip loading plugin \"io.containerd.tracing.processor.v1.otlp\"..." error="no OpenTelemetry endpoint: skip plugin" type=io.containerd.tracing.processor.v1
time="2022-09-22T14:00:00.333322524Z" level=info msg="loading plugin \"io.containerd.internal.v1.tracing\"..." type=io.containerd.internal.v1
time="2022-09-22T14:00:00.333374814Z" level=error msg="failed to initialize a tracing processor \"otlp\"" error="no OpenTelemetry endpoint: skip plugin"
time="2022-09-22T14:00:00.337277294Z" level=info msg=serving... address=/var/run/docker/containerd/containerd-debug.sock
time="2022-09-22T14:00:00.337384211Z" level=info msg=serving... address=/var/run/docker/containerd/containerd.sock.ttrpc
time="2022-09-22T14:00:00.337461354Z" level=info msg=serving... address=/var/run/docker/containerd/containerd.sock
time="2022-09-22T14:00:00.337500359Z" level=info msg="containerd successfully booted in 0.063798s"
time="2022-09-22T14:00:00.343443667Z" level=info msg="parsed scheme: \"unix\"" module=grpc
time="2022-09-22T14:00:00.343469171Z" level=info msg="scheme \"unix\" not registered, fallback to default scheme" module=grpc
time="2022-09-22T14:00:00.343490468Z" level=info msg="ccResolverWrapper: sending update to cc: {[{unix:///var/run/docker/containerd/containerd.sock <nil> 0 <nil>}] <nil> <nil>}" module=grpc
time="2022-09-22T14:00:00.343536439Z" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
time="2022-09-22T14:00:00.344576449Z" level=info msg="parsed scheme: \"unix\"" module=grpc
time="2022-09-22T14:00:00.344595783Z" level=info msg="scheme \"unix\" not registered, fallback to default scheme" module=grpc
time="2022-09-22T14:00:00.344671525Z" level=info msg="ccResolverWrapper: sending update to cc: {[{unix:///var/run/docker/containerd/containerd.sock <nil> 0 <nil>}] <nil> <nil>}" module=grpc
time="2022-09-22T14:00:00.344685028Z" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
time="2022-09-22T14:00:00.364893834Z" level=warning msg="Your kernel does not support cgroup blkio weight"
time="2022-09-22T14:00:00.364912852Z" level=warning msg="Your kernel does not support cgroup blkio weight_device"
time="2022-09-22T14:00:00.365109632Z" level=info msg="Loading containers: start."
time="2022-09-22T14:00:00.463692617Z" level=info msg="Default bridge (docker0) is assigned with an IP address 172.17.0.0/16. Daemon option --bip can be used to set a preferred IP address"
time="2022-09-22T14:00:00.542943451Z" level=info msg="Loading containers: done."
time="2022-09-22T14:00:00.555967430Z" level=info msg="Docker daemon" commit=e42327a graphdriver(s)=overlay2 version=20.10.18
time="2022-09-22T14:00:00.556092973Z" level=info msg="Daemon has completed initialization"
time="2022-09-22T14:00:00.635477377Z" level=info msg="API listen on /var/run/docker.sock"
time="2022-09-22T14:00:00.646423616Z" level=info msg="API listen on [::]:2376"
Additional Context
The pod status:
NAME READY STATUS RESTARTS AGE eks-runner-terraform-vsftp-security 1/2 NotReady 0 2m8s
runner container status:
Containers:
runner:
Container ID: docker://9b258234e9e289b251c4f14889ca37f995955e5c3a55b3236f696b3300cdeca2
Image: summerwind/actions-runner:latest
Image ID: docker-pullable://summerwind/actions-runner@sha256:771a21d0c6f4ce2c403aa52fe2524b8a1a83dd70430ae6468cef9e9fa3095ea5
Port:
@ajardan Hey! This can't be investigated further with the provided information. It's working fine for me so this might be due to some edge-cases coming from your configuration or your environment. Can you provide full logs from ARC and full kubectl describe or kubectl get output for the pods and runners?
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.
@mumoshu
I am running into a similar issue where-in the runner container has an exit 0 and then the pod sticks around until ARC comes and cleans it up. Previously, the pods would self-terminate when the runner container hit the exit 0
and now they don't.
We have changed a few things:
- We are trying to install ARC through ArgoCD (helm) now
- We updated our runner image from a pinned SHA to
:latest
.
Where would you recommend we look for configuration errors that are preventing the pods from exhibiting the correct behavior?
The same here
Previously, the pods would self-terminate when the runner container hit the exit 0
@jrkarnes What do you mean by "self-terminate" here? The pod does not terminate on its own. It needs to be deleted by doing a K8s DELETE Pod API call. Usually, ARC should detect the terminated container(s) in a runner pod and react to it by calling the delete pod API. Perhaps it isn't working for you for whatever reason? 🤔 Unfortunately though, I can't debug further with the provided information. Please file a dedicated issue linking this one, and do share the complete controller logs for investigation. A one-line excerpt from the controller log doesn't help...(and that's what's provided in this bug report). "The same here" doesn't help either, because there's no way for me to see if it's actually the same issue, or it's just a completely different issue that looks similar.
In case it helps some of you folks here. We were experiencing the same issue on our side and it was due to missing a CRD upgrade after an upgrade from 0.20.xx to 0.26.0.
There was a change of behaviour after 0.21.xx that required a CRD upgrade in order for this functionality to keep working (clean up runners after they terminate). If you don't upgrade the CRD, you'll have to rely on the sync to clean up the pods (and you're also probably setting yourself up for a bunch of other problems).
@mumoshu describes the problem and the fix here https://github.com/actions-runner-controller/actions-runner-controller/issues/1291#issuecomment-1085243956 and here https://github.com/actions-runner-controller/actions-runner-controller/issues/1291#issuecomment-1085293928
edit: ping @ajardan @jrkarnes @brnpimentel
I am seeing the same issue with v0.26.0 of actions-runner-controller
running on EKS
1.23. I tried adjusting the syncPeriod via values of the helm chart, and even confirmed that the pod yaml showed the setting. Yet it seemed to have no effect. I also watched a Runner
pod hang in NotReady
for over 10 minutes.
Downgrading to v0.21.0 of actions-runner-controller
, and 0.16.1 of the helm chart works. I see a near instant termination of the pod instead of NotReady
.
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.
/remove stale
Hi,
we noticed the same problem with the latest version on EKS 1.23. Any update or progress on this?
I have the same setup as the first post. Helm chart with the same values. The pod is not cleaned up
√ Removed .credentials
√ Removed .runner
"Runner listener exit with 0 return code, stop the service, no retry needed."
"Exiting runner..."
Even though it was a clean install with installCRD=true
, I replaced the CRDs with the recommended procedure and deleted the pods. The result was the same
I am running on AKS version 1.23.12
It is only happening on windows runners
The runners have an issue with version 2.299.1
and the latest one 2.300.2
I have tried a downgrade to helm chart 1.17.0
which has the 0.21.1
app version and back again to 0.21.1
but the issue is the same on both versions
The docker image for windows runners worked as it should on a previous AKS v1.19
with summerwind/actions-runner-controller:v0.19.0
Any advice on how to debug this is highly appreciated
The message from the controller is
INFO actions-runner-controller.runnerpod Runner pod is annotated to wait for completion, and the runner container is not restarting
@mazilu88 I migrated my configuration to a RunnerDeployment, and everything seems to work well this way.
Thank you for the reply.
I am using RD and HRA so sadly that is not the fix for me
The solution for me was to configure the entrypoint as per documentation https://github.com/actions/actions-runner-controller/blob/3ede9b5a0159a5e0703ccae6eebfdc89defe2b8f/docs/configuring-windows-runners.md
In the initial setup, I had ENTRYPOINT ["pwsh", "-c", "./configure.ps1;]
and inside it the ./run.cmd
Changed it to ENTRYPOINT ["pwsh", "-c", "./configure.ps1; ./run.cmd"]
I think this issue may be fixed in the latest version. This is what I ran on my K8s:
# Install cert-manager
helm repo add jetstack https://charts.jetstack.io
helm repo update
# Install chart
helm install --wait --create-namespace --namespace cert-manager cert-manager jetstack/cert-manager --version v1.3.0 --set installCRDs=true
# Install actions-runner-controller
helm repo add actions-runner-controller https://actions-runner-controller.github.io/actions-runner-controller
# Install chart
helm upgrade --install --namespace actions-runner-system --create-namespace --set=authSecret.create=true --set=authSecret.enabled=true --set=authSecret.github_token="token_with_repo_admin:org_goes_here" --wait actions-runner-controller actions-runner-controller/actions-runner-controller
And now the pods disappear very quickly after a job finishes.
actions-runner-controller app version: v0.27.0 chart version: 0.22.0
We are still seeing this in v0.27.0
selenium-e2e-jpl4t-8442l 1/1 Terminating 0 3h22m
selenium-e2e-jpl4t-9r89j 1/1 Terminating 0 3h21m
selenium-e2e-jpl4t-cwqzl 1/1 Terminating 0 3h19m
selenium-e2e-jpl4t-jnl6t 1/1 Terminating 0 3h21m
The node the pod belongs to is no longer under kubectl get nodes
, the pod is stuck terminating, has a finalizer that is preventing it from closing, and the only error message is:
kubectl logs actions-runner-controller-6b77bf7bf6-l2cpm | grep selenium-e2e-jpl4t-cwqzl | tail
2023-02-14T23:22:13Z INFO runnerpod Runner pod is annotated to wait for completion, and the runner container is not restarting {"runnerpod": "gh-runner/selenium-e2e-jpl4t-cwqzl"}
2023-02-14T23:22:35Z INFO runnerpod Runner pod is annotated to wait for completion, and the runner container is not restarting {"runnerpod": "gh-runner/selenium-e2e-jpl4t-cwqzl"}
2023-02-14T23:23:18Z INFO runnerpod Runner pod is annotated to wait for completion, and the runner container is not restarting {"runnerpod": "gh-runner/selenium-e2e-jpl4t-cwqzl"}
2023-02-14T23:23:35Z INFO runnerpod Runner pod is annotated to wait for completion, and the runner container is not restarting {"runnerpod": "gh-runner/selenium-e2e-jpl4t-cwqzl"}
2023-02-14T23:24:23Z INFO runnerpod Runner pod is annotated to wait for completion, and the runner container is not restarting {"runnerpod": "gh-runner/selenium-e2e-jpl4t-cwqzl"}
2023-02-14T23:24:35Z INFO runnerpod Runner pod is annotated to wait for completion, and the runner container is not restarting {"runnerpod": "gh-runner/selenium-e2e-jpl4t-cwqzl"}
2023-02-14T23:25:28Z INFO runnerpod Runner pod is annotated to wait for completion, and the runner container is not restarting {"runnerpod": "gh-runner/selenium-e2e-jpl4t-cwqzl"}
2023-02-14T23:25:35Z INFO runnerpod Runner pod is annotated to wait for completion, and the runner container is not restarting {"runnerpod": "gh-runner/selenium-e2e-jpl4t-cwqzl"}
2023-02-14T23:26:34Z INFO runnerpod Runner pod is annotated to wait for completion, and the runner container is not restarting {"runnerpod": "gh-runner/selenium-e2e-jpl4t-cwqzl"}
2023-02-14T23:26:35Z INFO runnerpod Runner pod is annotated to wait for completion, and the runner container is not restarting {"runnerpod": "gh-runner/selenium-e2e-jpl4t-cwqzl"}
We are still seeing this in v0.27.0
Then it must be some combination of other versioning issues, such as maybe K8s cluster version, or chart version or something. I have not seen any of this, and I've now set up two new clusters with no issues.
And now the pods disappear very quickly after a job finishes.
@emmahsax can you share in which k8s version you are using?
We are using Kubernetes 1.24 with AWS EKS.
@emmahsax Assuming you're using the horizontal autoscaler, take a look at RUNNER_GRACEFUL_STOP_TIMEOUT
and terminationGracePeriodSeconds
in the docs. It seems to have solved the issue for me.
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.