actions-runner-controller
actions-runner-controller copied to clipboard
Configure Worker Resources
What would you like added?
When running in kubernetes
mode, a workflow pod is created for each job. Unfortunately, there doesn't seem to be a way to configure the CPU & memory resource requests and limits for these worker pods, despite setting resource requests/limits on the runner containers.
Why is this needed?
A tool such as yarn will use as much CPU and memory as possible to complete its task. As such, when pods are created without resource limits, they can starve the node and prevent other services or workers from running on it. This also means that clusters with cluster autoscaling cannot scale to handle large spikes of workflow runs.
Additional context
Cant it be limited like this? maybe I misunderstand
kind: RunnerDeployment
metadata:
name: my-runner
spec:
replicas: 2
template:
spec:
repository: myrepo
labels:
- my-runner
containers:
- name: runner
resources:
limits:
cpu: "250m"
hugepages-2Mi: 2Gi
memory: 100Mi
requests:
cpu: "250m"
memory: 100Mi
hugepages-2Mi: 2Gi
volumeMounts:
- mountPath: /dev/hugepages
name: hugepage
env:
- name: USER
value: "runner"
volumes:
- name: hugepage
emptyDir:
medium: HugePages
@johnoloughlin I'm using runner scale sets via the helm chart, there doesn't seem to be a way to set the resources specifically for the worker pod that is created, it's only set on the runner
pod.
I was looking for something similar, i don't see a way to pass through resources like GPUs because they need the limit set on the pod actually running the workflow.
For others here i found this: https://github.com/actions/actions-runner-controller/discussions/3107#discussioncomment-7691417 which looks to allow for what is needed
does anyone have an example for the hook extension?
@omri-shilton
---
<snip>
env:
- name: ACTIONS_RUNNER_CONTAINER_HOOK_TEMPLATE
value: /etc/config/runner-template.yaml
volumeMounts:
- mountPath: /home/runner/_work
name: work
- mountPath: /etc/config
name: hook-template
volumes:
- name: hook-template
configMap:
name: runner-config
---
apiVersion: v1
kind: ConfigMap
metadata:
name: runner-config
namespace: actions-runners
data:
runner-template.yaml: |
---
spec:
containers:
- name: $job
resources:
limits:
cpu: 2
memory: 8Gi
requests:
cpu: 2
memory: 8Gi
This still doesn't help us, unfortunately, we need a way to use the standard k8s scheduler and use inter-pod affinity to ensure the runner and the job pods are scheduled on the same node.
Ideally, there should be a shared label between the two pods that we can use to do this without needing to use ReadWriteMany storage providers.
we need a way to use the standard k8s scheduler and use inter-pod affinity to ensure the runner and the job pods are scheduled on the same node
We are in the same boat. How does anyone successfully use this architecture with something like karpenter that is constantly scaling nodes up and down depending on the demand especially when assigning resources and things like pod affinity are difficult or impossible wrangle?