actions-runner-controller
actions-runner-controller copied to clipboard
`service` containers not working on runners with `containerMode: kubernetes`
Controller Version
0.25.2
Helm Chart Version
0.20.2
CertManager Version
1.9.1
Deployment Method
Helm
cert-manager installation
Yes I've followed https://github.com/actions-runner-controller/actions-runner-controller#installation and installed cert-manager from the official source https://cert-manager.io/docs/installation/helm/
Checks
- [X] This isn't a question or user support case (For Q&A and community support, go to Discussions. It might also be a good idea to contract with any of contributors and maintainers if your business is so critical and therefore you need priority support
- [X] I've read releasenotes before submitting this issue and I'm sure it's not due to any recently-introduced backward-incompatible changes
- [X] My actions-runner-controller version (v0.x.y) does support the feature
- [X] I've already upgraded ARC (including the CRDs, see charts/actions-runner-controller/docs/UPGRADING.md for details) to the latest and it didn't fix the issue
Resource Definitions
apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerSet
metadata:
name: k8s-runner
namespace: actions-runner-system
spec:
replicas: 4
organization: devx-ibp
containerMode: kubernetes
serviceAccountName: runner-service-account
selector:
matchLabels:
app: k8s-runner
serviceName: k8s-runner
template:
metadata:
labels:
app: k8s-runner
workVolumeClaimTemplate:
storageClassName: standard
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
labels:
- k8s-runner
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: runner-role
namespace: actions-runner-system
rules:
- apiGroups: [ "" ]
resources: [ "pods" ]
verbs: [ "get", "list", "create", "delete" ]
- apiGroups: [ "" ]
resources: [ "pods/exec" ]
verbs: [ "get", "create" ]
- apiGroups: [ "" ]
resources: [ "pods/log" ]
verbs: [ "get", "list", "watch", ]
- apiGroups: [ "batch" ]
resources: [ "jobs" ]
verbs: [ "get", "list", "create", "delete" ]
- apiGroups: [ "" ]
resources: [ "secrets" ]
verbs: [ "get", "list", "create", "delete" ]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: runner-role-binding
namespace: actions-runner-system
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: runner-role
subjects:
- kind: ServiceAccount
name: runner-service-account
namespace: actions-runner-system
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: runner-service-account
namespace: actions-runner-system
Storage Class:
Name: standard
IsDefaultClass: Yes
Annotations: kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"storage.k8s.io/v1","kind":"StorageClass","metadata":{"annotations":{"storageclass.kubernetes.io/is-default-class":"true"},"name":"standard"},"provisioner":"rancher.io/local-path","reclaimPolicy":"Delete","volumeBindingMode":"WaitForFirstConsumer"}
,storageclass.kubernetes.io/is-default-class=true
Provisioner: rancher.io/local-path
Parameters: <none>
AllowVolumeExpansion: <unset>
MountOptions: <none>
ReclaimPolicy: Delete
VolumeBindingMode: WaitForFirstConsumer
Events: <none>
To Reproduce
Execute the following workflow:
name: Go
on:
push:
branches: [ main ]
pull_request:
branches: [ main ]
jobs:
build:
runs-on: [ self-hosted, k8s-runner ]
services:
redis:
image: redis
ports:
- 6379/tcp
container:
image: golang:alpine
steps:
- uses: actions/checkout@v3
- run: go build cmd/hello/main.go
- run: ./main
Describe the bug
The initialize container step fails:

5s
##[debug]Evaluating condition for step: 'Initialize containers'
##[debug]Evaluating: success()
##[debug]Evaluating success:
##[debug]=> true
##[debug]Result: true
##[debug]Starting: Initialize containers
##[debug]Register post job cleanup for stopping/deleting containers.
Run '/runner/k8s/index.js'
##[debug]/runner/externals/node[1](https://github.com/devx-ibp/bquenin-actions/runs/8142952627?check_suite_focus=true#step:3:1)[6](https://github.com/devx-ibp/bquenin-actions/runs/8142952627?check_suite_focus=true#step:3:6)/bin/node /runner/k[8](https://github.com/devx-ibp/bquenin-actions/runs/8142952627?check_suite_focus=true#step:3:8)s/index.js
##[debug]Using image 'golang:alpine' for job image
##[debug]Adding service 'redis' to pod definition
Error: Error: failed to create job pod: HttpError: HTTP request failed
Error: Process completed with exit code 1.
Error: Executing the custom container implementation failed. Please contact your self hosted runner administrator.
##[debug]System.Exception: Executing the custom container implementation failed. Please contact your self hosted runner administrator.
##[debug] ---> System.Exception: The hook script at '/runner/k8s/index.js' running command 'PrepareJob' did not execute successfully
##[debug] at GitHub.Runner.Worker.Container.ContainerHooks.ContainerHookManager.ExecuteHookScript[T](IExecutionContext context, HookInput input, ActionRunStage stage, String prependPath)
##[debug] --- End of inner exception stack trace ---
##[debug] at GitHub.Runner.Worker.Container.ContainerHooks.ContainerHookManager.ExecuteHookScript[T](IExecutionContext context, HookInput input, ActionRunStage stage, String prependPath)
##[debug] at GitHub.Runner.Worker.Container.ContainerHooks.ContainerHookManager.PrepareJobAsync(IExecutionContext context, List`1 containers)
##[debug] at GitHub.Runner.Worker.ContainerOperationProvider.StartContainersAsync(IExecutionContext executionContext, Object data)
##[debug] at GitHub.Runner.Worker.JobExtensionRunner.RunAsync()
##[debug] at GitHub.Runner.Worker.StepsRunner.RunStepAsync(IStep step, CancellationToken jobCancellationToken)
##[debug]Finishing: Initialize containers
Describe the expected behavior
Hi,
I'm trying to use a service container in a job. I was expecting the service container to be created as an additional container to the pod executing this job but it looks like it's not working. Is there anything I'm missing?

Controller Logs
https://gist.github.com/bquenin/ddbe50c71dadd6b136ab0b0b5bee6e63
Runner Pod Logs
https://gist.github.com/bquenin/ddbe50c71dadd6b136ab0b0b5bee6e63
I believe there might be a sanitization bug in the portmapping of containerMode: kubernetes.
Instead of 6379/tcp, does 6379 (tcp is default) or 6379:6379/tcp work?
I met the same issue too. I built a new action runner image with new RUNNER_CONTAINER_HOOKS_VERSION=0.2.0.
All the following cases failed.
- with port setting:
6379 - with port setting:
6379:6379 - with port setting:
6379:6379/tcp - without port setting
Any updates here?:) Facing the same problem.....
While the exact error is different than what is descibed here, @Brenner87 and I have been unable to use GHA sidecar containers in containerMode: kubernetes as well, read more here: https://github.com/actions/actions-runner-controller/discussions/2227
But it seems to be completely nuking the entrypoint command to start the sidecar container for us, having nothing to do with ports.
I got this error too, and it turns out to be related to our OPA policy to require resources on all containers,
it tooks days for me to figure our the root cause but it's really a tiny issue, the real problem is with the error message, I updated this line to provide detailed error info:
https://github.com/actions/runner-container-hooks/blob/main/packages/k8s/src/hooks/prepare-job.ts#LL53C32-L53C42
# from
throw new Error(`failed to create job pod: ${err}`)
# to
throw new Error(`failed to create job pod: ${JSON.stringify(err)}`)
then, instead of "HTTP Error", you'll get log like this;
Error: Error: failed to create job pod: {"response":{"statusCode":403,"body":{"... is forbidden: failed quota: fuze-quota: must specify cpu for: job; memory for: job","reason":"Forbidden","..."statusCode":403,"name":"HttpError"}
You may get a different error, but I'm sure you'll know how to fix it :)
I have packaged a fixed version of Docker image here https://hub.docker.com/r/kacifer/actions-runner, specify image in the controller deployment or your runner spec: kacifer/actions-runner:0.0.2 (I'm not keeping this image up to date, you could package your own easily).
@kacifer have you considered a PR against https://github.com/actions/runner-container-hooks? Seems like it'd be worth it. I just ran into this error when I tried to have the worker pod use a service account that didn't exist. Would have been handy to get the full error message here.
@stephen-tatari yes I could do that, glad to know someone else have the same problem 🤡
Any update on the core issue here? Is it possible to run a job that creates services with a containerMode: kubernetes?
@kacifer could you elaborate on the solution you have found? I can't find any PR open to fix this
Following the error messages OP received, it seems to me like it's a configuration issue. This thread #3073 lead me to test service containers with localhost, which works fine!
Just posting this here in case anybody comes here because of the issue title. It does not appear to be a general problem.