actions-runner-controller
actions-runner-controller copied to clipboard
Jobs are waiting too long for a runner to come online.
Checks
- [X] I've already read https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners-with-actions-runner-controller/troubleshooting-actions-runner-controller-errors and I'm sure my issue is not covered in the troubleshooting guide.
- [X] I am using charts that are officially provided
Controller Version
0.9.3
Deployment Method
ArgoCD
Checks
- [X] This isn't a question or user support case (For Q&A and community support, go to Discussions).
- [X] I've read the Changelog before submitting this issue and I'm sure it's not due to any recently-introduced backward-incompatible changes
To Reproduce
- install the controller
- start a job
Describe the bug
Some jobs are waiting from 30 seconds to more than 90 seconds to be scheduled on a runner.
Describe the expected behavior
Jobs should not have to wait that long in my opinion
Additional Context
---
podLabels:
finops.company.net/stage: prod
finops.company.net/service_class: live
finops.company.net/cluster: gke-live-labs-europe-west1
bufferReserveResourcesCronJob:
create: true
gha-runner-scale-set-controller:
resources:
limits:
memory: 300Mi
requests:
cpu: 100m
memory: 300Mi
flags:
logFormat: "json"
podLabels:
finops.company.net/stage: prod
finops.company.net/service_class: live
finops.company.net/cluster: gke-live-labs-europe-west1
podAnnotations:
logs.company.com/datadog_source: "gha-runner-scale-set"
gha-runner-scale-set:
runnerScaleSetName: "company-hosted"
maxRunners: 200
listenerTemplate:
metadata:
labels:
finops.company.net/stage: prod
finops.company.net/service_class: live
finops.company.net/cluster: gke-live-labs-europe-west1
annotations:
logs.company.com/datadog_source: "gha-runner-scale-set"
ad.datadoghq.com/listener.checks: |
{
"openmetrics": {
"instances": [
{
"openmetrics_endpoint": "http://%%host%%:8080/metrics",
"histogram_buckets_as_distributions": true,
"namespace": "actions-runner-system",
"metrics": [".*"],
"max_returned_metrics": 12000
}
]
}
}
template:
metadata:
labels:
finops.company.net/stage: prod
finops.company.net/service_class: live
finops.company.net/cluster: gke-live-labs-europe-west1
annotations:
logs.company.com/datadog_source: "gha-runner-scale-set"
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: node_pool
operator: In
values:
- github-actions
tolerations:
- key: "github-actions"
operator: "Equal"
value: "true"
effect: "NoSchedule"
topologySpreadConstraints:
- maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: ScheduleAnyway
labelSelector:
matchLabels:
app.kubernetes.io/component: runner
containers:
- name: runner
image: europe-docker.pkg.dev/platform-replace/company-prod/devex/gha-runners:v0.1.13
command: ["/home/runner/run.sh"]
resources:
requests:
cpu: 4
controllerServiceAccount:
namespace: actions-runner-system
name: actions-runner-controller-gha-rs-controller
Controller Logs
https://gist.github.com/julien-michaud/585574678b5804eafdf30c913030543e
listener logs:
https://gist.github.com/julien-michaud/27c8025ea0117243f0a85dde1e31bf9f
Runner Pod Logs
https://gist.github.com/julien-michaud/bd3a618f5e8e1d1de1dbb688619563a6