actions-runner-controller icon indicating copy to clipboard operation
actions-runner-controller copied to clipboard

Configure Worker Resources

Open MartiUK opened this issue 1 year ago • 7 comments

What would you like added?

When running in kubernetes mode, a workflow pod is created for each job. Unfortunately, there doesn't seem to be a way to configure the CPU & memory resource requests and limits for these worker pods, despite setting resource requests/limits on the runner containers.

Why is this needed?

A tool such as yarn will use as much CPU and memory as possible to complete its task. As such, when pods are created without resource limits, they can starve the node and prevent other services or workers from running on it. This also means that clusters with cluster autoscaling cannot scale to handle large spikes of workflow runs.

Additional context

image

MartiUK avatar Nov 28 '23 10:11 MartiUK

Cant it be limited like this? maybe I misunderstand

kind: RunnerDeployment
metadata:
  name: my-runner
spec:
  replicas: 2
  template:
    spec:
      repository: myrepo
      labels:
         - my-runner
      containers:
        - name: runner
          resources:
            limits:
              cpu: "250m"
              hugepages-2Mi: 2Gi
              memory: 100Mi
            requests:
              cpu: "250m"
              memory: 100Mi
              hugepages-2Mi: 2Gi
          volumeMounts:
            - mountPath: /dev/hugepages
              name: hugepage
          env:
            - name: USER
              value: "runner"
      volumes:
        - name: hugepage
          emptyDir:
            medium: HugePages

johnoloughlin avatar Nov 28 '23 17:11 johnoloughlin

@johnoloughlin I'm using runner scale sets via the helm chart, there doesn't seem to be a way to set the resources specifically for the worker pod that is created, it's only set on the runner pod.

MartiUK avatar Dec 06 '23 13:12 MartiUK

I was looking for something similar, i don't see a way to pass through resources like GPUs because they need the limit set on the pod actually running the workflow.

MichaelHudgins avatar Dec 07 '23 21:12 MichaelHudgins

For others here i found this: https://github.com/actions/actions-runner-controller/discussions/3107#discussioncomment-7691417 which looks to allow for what is needed

MichaelHudgins avatar Dec 07 '23 21:12 MichaelHudgins

does anyone have an example for the hook extension?

omri-shilton avatar Dec 25 '23 13:12 omri-shilton

@omri-shilton

---
<snip>
          env:
          - name: ACTIONS_RUNNER_CONTAINER_HOOK_TEMPLATE
            value: /etc/config/runner-template.yaml
          volumeMounts:
          - mountPath: /home/runner/_work
            name: work
          - mountPath: /etc/config
            name: hook-template
        volumes:
        - name: hook-template
          configMap:
            name: runner-config
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: runner-config
  namespace: actions-runners
data:
  runner-template.yaml: |
    ---
    spec:
      containers:
      - name: $job
        resources:
          limits:
            cpu: 2
            memory: 8Gi
          requests:
            cpu: 2
            memory: 8Gi

This still doesn't help us, unfortunately, we need a way to use the standard k8s scheduler and use inter-pod affinity to ensure the runner and the job pods are scheduled on the same node.

Ideally, there should be a shared label between the two pods that we can use to do this without needing to use ReadWriteMany storage providers.

MartiUK avatar Mar 05 '24 16:03 MartiUK

we need a way to use the standard k8s scheduler and use inter-pod affinity to ensure the runner and the job pods are scheduled on the same node

We are in the same boat. How does anyone successfully use this architecture with something like karpenter that is constantly scaling nodes up and down depending on the demand especially when assigning resources and things like pod affinity are difficult or impossible wrangle?

tskinner-oppfi avatar Mar 27 '24 23:03 tskinner-oppfi