actions-runner-controller icon indicating copy to clipboard operation
actions-runner-controller copied to clipboard

Jobs are waiting too long for a runner to come online.

Open julien-michaud opened this issue 6 months ago • 5 comments

Checks

  • [X] I've already read https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners-with-actions-runner-controller/troubleshooting-actions-runner-controller-errors and I'm sure my issue is not covered in the troubleshooting guide.
  • [X] I am using charts that are officially provided

Controller Version

0.9.3

Deployment Method

ArgoCD

Checks

  • [X] This isn't a question or user support case (For Q&A and community support, go to Discussions).
  • [X] I've read the Changelog before submitting this issue and I'm sure it's not due to any recently-introduced backward-incompatible changes

To Reproduce

- install the controller
- start a job

Describe the bug

Some jobs are waiting from 30 seconds to more than 90 seconds to be scheduled on a runner.

Describe the expected behavior

Jobs should not have to wait that long in my opinion

Additional Context

---
podLabels:
  finops.company.net/stage: prod
  finops.company.net/service_class: live
  finops.company.net/cluster: gke-live-labs-europe-west1

bufferReserveResourcesCronJob:
  create: true

gha-runner-scale-set-controller:
  resources:
    limits:
      memory: 300Mi
    requests:
      cpu: 100m
      memory: 300Mi
  flags:
    logFormat: "json"
  podLabels:
    finops.company.net/stage: prod
    finops.company.net/service_class: live
    finops.company.net/cluster: gke-live-labs-europe-west1
  podAnnotations:
    logs.company.com/datadog_source: "gha-runner-scale-set"

gha-runner-scale-set:
  runnerScaleSetName: "company-hosted"
  maxRunners: 200
  listenerTemplate:
    metadata:
      labels:
        finops.company.net/stage: prod
        finops.company.net/service_class: live
        finops.company.net/cluster: gke-live-labs-europe-west1
      annotations:
        logs.company.com/datadog_source: "gha-runner-scale-set"
        ad.datadoghq.com/listener.checks: |
          {
            "openmetrics": {
              "instances": [
                {
                  "openmetrics_endpoint": "http://%%host%%:8080/metrics",
                  "histogram_buckets_as_distributions": true,
                  "namespace": "actions-runner-system",
                  "metrics": [".*"],
                  "max_returned_metrics": 12000
                }
              ]
            }
          }
  template:
    metadata:
      labels:
        finops.company.net/stage: prod
        finops.company.net/service_class: live
        finops.company.net/cluster: gke-live-labs-europe-west1
      annotations:
        logs.company.com/datadog_source: "gha-runner-scale-set"
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: node_pool
                operator: In
                values:
                - github-actions
      tolerations:
        - key: "github-actions"
          operator: "Equal"
          value: "true"
          effect: "NoSchedule"
      topologySpreadConstraints:
        - maxSkew: 1
          topologyKey: kubernetes.io/hostname
          whenUnsatisfiable: ScheduleAnyway
          labelSelector:
            matchLabels:
              app.kubernetes.io/component: runner
      containers:
        - name: runner
          image: europe-docker.pkg.dev/platform-replace/company-prod/devex/gha-runners:v0.1.13
          command: ["/home/runner/run.sh"]
          resources:
            requests:
              cpu: 4
  controllerServiceAccount:
    namespace: actions-runner-system
    name: actions-runner-controller-gha-rs-controller

Controller Logs

https://gist.github.com/julien-michaud/585574678b5804eafdf30c913030543e

listener logs:
https://gist.github.com/julien-michaud/27c8025ea0117243f0a85dde1e31bf9f

Runner Pod Logs

https://gist.github.com/julien-michaud/bd3a618f5e8e1d1de1dbb688619563a6

julien-michaud avatar Aug 12 '24 14:08 julien-michaud