actions-runner-controller icon indicating copy to clipboard operation
actions-runner-controller copied to clipboard

resources specification is not working

Open klepiz opened this issue 1 year ago • 3 comments

Checks

  • [X] I've already read https://github.com/actions/actions-runner-controller/blob/master/TROUBLESHOOTING.md and I'm sure my issue is not covered in the troubleshooting guide.
  • [X] I'm not using a custom entrypoint in my runner image

Controller Version

v1.26.7

Helm Chart Version

0.23.4

CertManager Version

No response

Deployment Method

Helm

cert-manager installation

I followed the installation process as the documentatio said https://github.com/actions/actions-runner-controller/blob/master/docs/installing-arc.md

Checks

  • [X] This isn't a question or user support case (For Q&A and community support, go to Discussions. It might also be a good idea to contract with any of contributors and maintainers if your business is so critical and therefore you need priority support
  • [X] I've read releasenotes before submitting this issue and I'm sure it's not due to any recently-introduced backward-incompatible changes
  • [X] My actions-runner-controller version (v0.x.y) does support the feature
  • [X] I've already upgraded ARC (including the CRDs, see charts/actions-runner-controller/docs/UPGRADING.md for details) to the latest and it didn't fix the issue
  • [X] I've migrated to the workflow job webhook event (if you using webhook driven scaling)

Resource Definitions

apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerDeployment
metadata:
  name: runner-deployment
  namespace: my-runners
spec:
 replicas: 1
 template:
    spec:
      resources:
        limits:
          cpu: "2"
          memory: "5Gi"
        requests:
          cpu: "1"
          memory: "4Gi"
      dockerMTU: 1400
      env:
        - name: ARC_DOCKER_MTU_PROPAGATION
          value: "true"
      githubAPICredentialsFrom:
        secretRef:
          name: controller-manager-my-runners
        - name: docker-secret
          secret:
            secretName: docker-auth
            items:
              - key: .dockerconfigjson
                path: config.json

      organization: my-org
      labels:
        - testing-new-k8s
      containers:
        - name: runner
          resources:
            limits:
              cpu: "1"
              memory: "2Gi"
            requests:
              cpu: "1"
              memory: "2Gi"
          securityContext:
            privileged: true
          volumeMounts:
            - name: docker-secret
              mountPath: "/home/runner/.docker/"
              readOnly: true
        - name: docker
          resources:
            limits:
              cpu: "3"
              memory: "8Gi"
            requests:
              cpu: "2"
              memory: "5Gi"
          securityContext:
            privileged: true

---
apiVersion: actions.summerwind.dev/v1alpha1
kind: HorizontalRunnerAutoscaler
metadata:
  name: runner-deployment-autoscaler
  namespace: my-runners
spec:
  githubAPICredentialsFrom:
    secretRef:
      name: controller-manager-my-runners
  scaleTargetRef:
    name: runner-deployment
  minReplicas: 20
  maxReplicas: 30

To Reproduce

1. kubectl apply -f self-hosted-runner.yml -n my-runners
2. once the runners are created go to the runner/docker container
3. running lscpu and free -g, its showing the total amount from the server, ignoring the specifications that I set ton the resource definition file (self-hosted-runner.yml)

Describe the bug

My k8s cluster have 4 nodes, each node with 20 cpus + 64gb of memory, I should be able to run at least 10 runners at the same time but that is not the case since the jobs got cancel from nowhere on the "Initialize container" step

  930778163b2d: Verifying Checksum
  930778163b2d: Download complete
  fc551ec0b9d5: Verifying Checksum
  fc551ec0b9d5: Download complete
  Error: The operation was canceled.

Describe the expected behavior

each pod should show the cpu and memory that I specify on the resource definition file

Whole Controller Logs

2023-09-01T15:21:40Z	DEBUG	runner	Runner appears to have been registered and running.	{"runner": "***-runners/***-runner-deployment-hjwsp-vcqt2", "podCreationTimestamp": "2023-09-01 15:21:36 +0000 UTC"}
2023-09-01T15:21:40Z	DEBUG	runner	Runner appears to have been registered and running.	{"runner": "***-runners/***-runner-deployment-hjwsp-66568", "podCreationTimestamp": "2023-09-01 15:21:36 +0000 UTC"}
2023-09-01T15:21:40Z	DEBUG	runner	Runner appears to have been registered and running.	{"runner": "***-runners/***-runner-deployment-hjwsp-cbbbn", "podCreationTimestamp": "2023-09-01 15:21:36 +0000 UTC"}
2023-09-01T15:21:40Z	DEBUG	runner	Runner appears to have been registered and running.	{"runner": "***-runners/***-runner-deployment-hjwsp-m27s8", "podCreationTimestamp": "2023-09-01 15:21:37 +0000 UTC"}
2023-09-01T15:21:40Z	DEBUG	runner	Runner appears to have been registered and running.	{"runner": "***-runners/***-runner-deployment-hjwsp-fw98k", "podCreationTimestamp": "2023-09-01 15:21:36 +0000 UTC"}
2023-09-01T15:21:41Z	DEBUG	runner	Runner appears to have been registered and running.	{"runner": "***-runners/***-runner-deployment-hjwsp-t6xs2", "podCreationTimestamp": "2023-09-01 15:21:36 +0000 UTC"}
2023-09-01T15:22:35Z	DEBUG	horizontalrunnerautoscaler	Calculated desired replicas of 20	{"horizontalrunnerautoscaler": "***-runners/***-runner-deployment-autoscaler", "suggested": 20, "reserved": 0, "min": 20, "max": 30, "last_scale_up_time": "2023-09-01 15:21:24 +0000 UTC", "scale_down_delay_until": "2023-09-01T15:31:24Z"}
2023-09-01T15:23:38Z	DEBUG	horizontalrunnerautoscaler	Calculated desired replicas of 20	{"horizontalrunnerautoscaler": "***-runners/***-runner-deployment-autoscaler", "suggested": 20, "reserved": 0, "min": 20, "max": 30, "last_scale_up_time": "2023-09-01 15:21:24 +0000 UTC", "scale_down_delay_until": "2023-09-01T15:31:24Z"}
2023-09-01T15:23:47Z	INFO	runner	Removed finalizer	{"runner": "***-runners/***-runner-deployment-hjwsp-vcqt2"}
2023-09-01T15:23:47Z	DEBUG	runnerreplicaset	Created replica(s)	{"runnerreplicaset": "***-runners/***-runner-deployment-hjwsp", "lastSyncTime": "2023-09-01T15:21:34Z", "effectiveTime": "<nil>", "templateHashDesired": "56f59b8797", "replicasDesired": 20, "replicasPending": 2, "replicasRunning": 17, "replicasMaybeRunning": 19, "templateHashObserved": ["56f59b8797"], "created": 1}
2023-09-01T15:23:47Z	DEBUG	runnerreplicaset	Skipped reconcilation because owner is not synced yet	{"runnerreplicaset": "***-runners/***-runner-deployment-hjwsp", "owner": "***-runners/***-runner-deployment-hjwsp-2fntb", "pods": null}
2023-09-01T15:23:47Z	DEBUG	runnerreplicaset	Skipped reconcilation because owner is not synced yet	{"runnerreplicaset": "***-runners/***-runner-deployment-hjwsp", "owner": "***-runners/***-runner-deployment-hjwsp-2fntb", "pods": null}
2023-09-01T15:23:47Z	DEBUG	runnerreplicaset	Skipped reconcilation because owner is not synced yet	{"runnerreplicaset": "***-runners/***-runner-deployment-hjwsp", "owner": "***-runners/***-runner-deployment-hjwsp-2fntb", "pods": null}
2023-09-01T15:23:47Z	INFO	runnerpod	Runner pod has been stopped with a successful status.	{"runnerpod": "***-runners/***-runner-deployment-hjwsp-vcqt2"}
2023-09-01T15:23:48Z	INFO	runner	Updated registration token	{"runner": "***-runner-deployment-hjwsp-2fntb", "repository": ""}


Note: I censored part of pod's name

Whole Runner Pod Logs

n/a

Additional Context

I also tried to set up the resources inside the containers without any success on the behavior, such as

....
containers:
        - name: runner
          resources:
            limits:
              cpu: "1"
              memory: "2Gi"
            requests:
              cpu: "1"
              memory: "2Gi"
          securityContext:
            privileged: true
          volumeMounts:
            - name: docker-secret
              mountPath: "/home/runner/.docker/"
              readOnly: true
        - name: docker
          resources:
            limits:
              cpu: "3"
              memory: "8Gi"
            requests:
              cpu: "2"
              memory: "5Gi"
          securityContext:
            privileged: true
...

klepiz avatar Sep 01 '23 16:09 klepiz