actions-runner-controller icon indicating copy to clipboard operation
actions-runner-controller copied to clipboard

actions runner pods error

Open sravula84 opened this issue 11 months ago • 4 comments

Checks

  • [X] I've already read https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners-with-actions-runner-controller/troubleshooting-actions-runner-controller-errors and I'm sure my issue is not covered in the troubleshooting guide.
  • [X] I am using charts that are officially provided

Controller Version

actions-runner-controller-0.22.0

Deployment Method

Helm

Checks

  • [X] This isn't a question or user support case (For Q&A and community support, go to Discussions).
  • [X] I've read the Changelog before submitting this issue and I'm sure it's not due to any recently-introduced backward-incompatible changes

To Reproduce

install controller 
configure runnerdeployment with replica 10
configure horizontalscaler 10 to 50 replica

Describe the bug

runner pods going to error state and throwing docker socket error

Describe the expected behavior

it should spin up the new runners based on the no of workflow triggered, in this case runner scaler configured with max 50

Additional Context

COMPUTED VALUES:
actionsMetrics:
  port: 8443
  proxy:
    enabled: true
    image:
      repository: quay.io/brancz/kube-rbac-proxy
      tag: v0.13.1
  serviceAnnotations: {}
  serviceMonitor: false
  serviceMonitorLabels: {}
actionsMetricsServer:
  affinity: {}
  enabled: false
  fullnameOverride: ""
  imagePullSecrets: []
  ingress:
    annotations: {}
    enabled: false
    hosts:
    - extraPaths: []
      host: chart-example.local
      paths: []
    ingressClassName: ""
    tls: []
  logFormat: text
  nameOverride: ""
  nodeSelector: {}
  podAnnotations: {}
  podLabels: {}
  podSecurityContext: {}
  priorityClassName: ""
  replicaCount: 1
  resources: {}
  secret:
    create: false
    enabled: false
    github_webhook_secret_token: ""
    name: actions-metrics-server
  securityContext: {}
  service:
    annotations: {}
    ports:
    - name: http
      port: 80
      protocol: TCP
      targetPort: http
    type: ClusterIP
  serviceAccount:
    annotations: {}
    create: true
    name: ""
  tolerations: []
additionalVolumeMounts: []
additionalVolumes: []
admissionWebHooks: {}
affinity: {}
authSecret:
  annotations: {}
  create: true
  enabled: true
  github_token:
  name: controller-manager
certManagerEnabled: true
defaultScaleDownDelay: 10m
dockerRegistryMirror: ""
enableLeaderElection: true
env: {}
fullnameOverride: ""
githubWebhookServer:
  affinity: {}
  enabled: false
  fullnameOverride: ""
  imagePullSecrets: []
  ingress:
    annotations: {}
    enabled: false
    hosts:
    - extraPaths: []
      host: chart-example.local
      paths: []
    ingressClassName: ""
    tls: []
  logFormat: text
  nameOverride: ""
  nodeSelector: {}
  podAnnotations: {}
  podDisruptionBudget:
    enabled: false
  podLabels: {}
  podSecurityContext: {}
  priorityClassName: ""
  replicaCount: 1
  resources: {}
  secret:
    create: false
    enabled: false
    github_webhook_secret_token: ""
    name: github-webhook-server
  securityContext: {}
  service:
    annotations: {}
    ports:
    - name: http
      port: 80
      protocol: TCP
      targetPort: http
    type: ClusterIP
  serviceAccount:
    annotations: {}
    create: true
    name: ""
  tolerations: []
  useRunnerGroupsVisibility: false
image:
  actionsRunnerImagePullSecrets: []
  actionsRunnerRepositoryAndTag: summerwind/actions-runner:latest
  dindSidecarRepositoryAndTag: docker:24.0.7-dind-alpine3.18
  pullPolicy: IfNotPresent
  repository: summerwind/actions-runner-controller
imagePullSecrets: []
labels: {}
logFormat: text
metrics:
  port: 8443
  proxy:
    enabled: true
    image:
      repository: quay.io/brancz/kube-rbac-proxy
      tag: v0.13.1
  serviceAnnotations: {}
  serviceMonitor: false
  serviceMonitorLabels: {}
nameOverride: ""
nodeSelector: {}
podAnnotations: {}
podDisruptionBudget:
  enabled: false
podLabels: {}
podSecurityContext: {}
priorityClassName: ""
rbac: {}
replicaCount: 1
resources: {}
runner:
  statusUpdateHook:
    enabled: false
scope:
  singleNamespace: false
  watchNamespace: ""
securityContext: {}
service:
  annotations: {}
  port: 443
  type: ClusterIP
serviceAccount:
  annotations: {}
  create: true
  name: ""
syncPeriod: 1m
tolerations: []
webhookPort: 9443

Controller Logs

2024-03-19T22:21:12Z	DEBUG	controller-runtime.webhook.webhooks	wrote response	{"webhook": "/mutate-runner-set-pod", "code": 200, "reason": "", "UID": "33222d85-db32-4f05-b866-858ddb83a913", "allowed": true}
2024-03-19T22:21:12Z	INFO	runner	Created runner pod	{"runner": "actions-runner-systems/github-action-small-5r8nj-29dt6", "repository": ""}
2024-03-19T22:21:12Z	DEBUG	events	Created pod 'github-action-small-5r8nj-29dt6'	{"type": "Normal", "object": {"kind":"Runner","namespace":"actions-runner-systems","name":"github-action-small-5r8nj-29dt6","uid":"595fdda6-8727-496d-9a9e-ec221d8e9e5a","apiVersion":"actions.summerwind.dev/v1alpha1","resourceVersion":"67959395"}, "reason": "PodCreated"}
2024-03-19T22:21:12Z	DEBUG	runnerreplicaset	Skipped reconcilation because owner is not synced yet	{"runnerreplicaset": "actions-runner-systems/github-action-small-5r8nj", "owner": "actions-runner-systems/github-action-small-5r8nj-29dt6", "pods": [{"kind":"Pod","apiVersion":"v1","metadata":{"name":"github-action-small-5r8nj-29dt6","namespace":"actions-runner-systems","uid":"c9d9e756-7f3c-450c-b5c3-578dc6eb462e","resourceVersion":"67959403","creationTimestamp":"2024-03-19T22:21:12Z","labels":{"actions-runner":"","actions-runner-controller/inject-registration-token":"true","pod-template-hash":"749c9b7998","runner-deployment-name":"github-action-small","runner-template-hash":"78ddc6dd8"},"annotations":{"actions-runner-controller/token-expires-at":"2024-03-19T16:10:51-07:00","sync-time":"2024-03-19T22:21:11Z"},"ownerReferences":[{"apiVersion":"actions.summerwind.dev/v1alpha1","kind":"Runner","name":"github-action-small-5r8nj-29dt6","uid":"595fdda6-8727-496d-9a9e-ec221d8e9e5a","controller":true,"blockOwnerDeletion":true}],"managedFields":[{"manager":"manager","operation":"Update","apiVersion":"v1","time":"2024-03-19T22:21:12Z","fieldsType":"FieldsV1","fieldsV1":{"f:metadata":{"f:annotations":{".":{},"f:sync-time":{}},"f:labels":{".":{},"f:actions-runner":{},"f:actions-runner-controller/inject-registration-token":{},"f:pod-template-hash":{},"f:runner-deployment-name":{},"f:runner-template-hash":{}},"f:ownerReferences":{".":{},"k:{\"uid\":\"595fdda6-8727-496d-9a9e-ec221d8e9e5a\"}":{}}},"f:spec":{"f:containers":{"k:{\"name\":\"docker\"}":{".":{},"f:env":{".":{},"k:{\"name\":\"DOCKER_TLS_CERTDIR\"}":{".":{},"f:name":{},"f:value":{}}},"f:image":{},"f:imagePullPolicy":{},"f:lifecycle":{".":{},"f:preStop":{".":{},"f:exec":{".":{},"f:command":{}}}},"f:name":{},"f:resources":{},"f:securityContext":{".":{},"f:privileged":{}},"f:terminationMessagePath":{},"f:terminationMessagePolicy":{},"f:volumeMounts":{".":{},"k:{\"mountPath\":\"/certs/client\"}":{".":{},"f:mountPath":{},"f:name":{}},"k:{\"mountPath\":\"/runner\"}":{".":{},"f:mountPath":{},"f:name":{}},"k:{\"mountPath\":\"/runner/_work\"}":{".":{},"f:mountPath":{},"f:name":{}}}},"k:{\"name\":\"runner\"}":{".":{},"f:env":{".":{},"k:{\"name\":\"DOCKERD_IN_RUNNER\"}":{".":{},"f:name":{},"f:value":{}},"k:{\"name\":\"DOCKER_CERT_PATH\"}":{".":{},"f:name":{},"f:value":{}},"k:{\"name\":\"DOCKER_ENABLED\"}":{".":{},"f:name":{},"f:value":{}},"k:{\"name\":\"DOCKER_HOST\"}":{".":{},"f:name":{},"f:value":{}},"k:{\"name\":\"DOCKER_TLS_VERIFY\"}":{".":{},"f:name":{},"f:value":{}},"k:{\"name\":\"GITHUB_ACTIONS_RUNNER_EXTRA_USER_AGENT\"}":{".":{},"f:name":{},"f:value":{}},"k:{\"name\":\"GITHUB_URL\"}":{".":{},"f:name":{},"f:value":{}},"k:{\"name\":\"RUNNER_ENTERPRISE\"}":{".":{},"f:name":{}},"k:{\"name\":\"RUNNER_EPHEMERAL\"}":{".":{},"f:name":{},"f:value":{}},"k:{\"name\":\"RUNNER_GROUP\"}":{".":{},"f:name":{}},"k:{\"name\":\"RUNNER_LABELS\"}":{".":{},"f:name":{},"f:value":{}},"k:{\"name\":\"RUNNER_NAME\"}":{".":{},"f:name":{},"f:value":{}},"k:{\"name\":\"RUNNER_ORG\"}":{".":{},"f:name":{},"f:value":{}},"k:{\"name\":\"RUNNER_REPO\"}":{".":{},"f:name":{}},"k:{\"name\":\"RUNNER_STATUS_UPDATE_HOOK\"}":{".":{},"f:name":{},"f:value":{}},"k:{\"name\":\"RUNNER_TOKEN\"}":{".":{},"f:name":{},"f:value":{}},"k:{\"name\":\"RUNNER_WORKDIR\"}":{".":{},"f:name":{},"f:value":{}}},"f:image":{},"f:imagePullPolicy":{},"f:name":{},"f:resources":{".":{},"f:limits":{".":{},"f:cpu":{},"f:memory":{}},"f:requests":{".":{},"f:cpu":{},"f:memory":{}}},"f:securityContext":{},"f:terminationMessagePath":{},"f:terminationMessagePolicy":{},"f:volumeMounts":{".":{},"k:{\"mountPath\":\"/certs/client\"}":{".":{},"f:mountPath":{},"f:name":{},"f:readOnly":{}},"k:{\"mountPath\":\"/runner\"}":{".":{},"f:mountPath":{},"f:name":{}},"k:{\"mountPath\":\"/runner/_work\"}":{".":{},"f:mountPath":{},"f:name":{}}}}},"f:dnsPolicy":{},"f:enableServiceLinks":{},"f:restartPolicy":{},"f:schedulerName":{},"f:securityContext":{},"f:terminationGracePeriodSeconds":{},"f:volumes":{".":{},"k:{\"name\":\"certs-client\"}":{".":{},"f:emptyDir":{},"f:name":{}},"k:{\"name\":\"runner\"}":{".":{},"f:emptyDir":{},"f:name":{}},"k:{\"name\":\"work\"}":{".":{},"f:emptyDir":{},"f:name":{}}}}}}]},"spec":{"volumes":[{"name":"runner","emptyDir":{}},{"name":"work","emptyDir":{}},{"name":"certs-client","emptyDir":{}},{"name":"kube-api-access-ldpcz","projected":{"sources":[{"serviceAccountToken":{"expirationSeconds":3607,"path":"token"}},{"configMap":{"name":"kube-root-ca.crt","items":[{"key":"ca.crt","path":"ca.crt"}]}},{"downwardAPI":{"items":[{"path":"namespace","fieldRef":{"apiVersion":"v1","fieldPath":"metadata.namespace"}}]}}],"defaultMode":420}}],"containers":[{"name":"runner","image":"summerwind/actions-runner:latest","env":[{"name":"RUNNER_ORG","value":"prosperllc"},{"name":"RUNNER_REPO"},{"name":"RUNNER_ENTERPRISE"},{"name":"RUNNER_LABELS","value":"pspr-utils-linux-np-small"},{"name":"RUNNER_GROUP"},{"name":"DOCKER_ENABLED","value":"true"},{"name":"DOCKERD_IN_RUNNER","value":"false"},{"name":"GITHUB_URL","value":"https://github.com/"},{"name":"RUNNER_WORKDIR","value":"/runner/_work"},{"name":"RUNNER_EPHEMERAL","value":"true"},{"name":"RUNNER_STATUS_UPDATE_HOOK","value":"false"},{"name":"GITHUB_ACTIONS_RUNNER_EXTRA_USER_AGENT","value":"actions-runner-controller/v0.27.0"},{"name":"DOCKER_HOST","value":"tcp://localhost:2376"},{"name":"DOCKER_TLS_VERIFY","value":"1"},{"name":"DOCKER_CERT_PATH","value":"/certs/client"},{"name":"RUNNER_NAME","value":"github-action-small-5r8nj-29dt6"},{"name":"RUNNER_TOKEN","value":"AI5PDGHSSR55YFZJEKLE44DF7INXW"}],"resources":{"limits":{"cpu":"2","memory":"4Gi"},"requests":{"cpu":"500m","memory":"2Gi"}},"volumeMounts":[{"name":"runner","mountPath":"/runner"},{"name":"work","mountPath":"/runner/_work"},{"name":"certs-client","readOnly":true,"mountPath":"/certs/client"},{"name":"kube-api-access-ldpcz","readOnly":true,"mountPath":"/var/run/secrets/kubernetes.io/serviceaccount"}],"terminationMessagePath":"/dev/termination-log","terminationMessagePolicy":"File","imagePullPolicy":"Always","securityContext":{}},{"name":"docker","image":"docker:24.0.7-dind-alpine3.18","env":[{"name":"DOCKER_TLS_CERTDIR","value":"/certs"}],"resources":{},"volumeMounts":[{"name":"runner","mountPath":"/runner"},{"name":"certs-client","mountPath":"/certs/client"},{"name":"work","mountPath":"/runner/_work"},{"name":"kube-api-access-ldpcz","readOnly":true,"mountPath":"/var/run/secrets/kubernetes.io/serviceaccount"}],"lifecycle":{"preStop":{"exec":{"command":["/bin/sh","-c","timeout \"${RUNNER_GRACEFUL_STOP_TIMEOUT:-15}\" /bin/sh -c \"echo 'Prestop hook started'; while [ -f /runner/.runner ]; do sleep 1; done; echo 'Waiting for dockerd to start'; while ! pgrep -x dockerd; do sleep 1; done; echo 'Prestop hook stopped'\" >/proc/1/fd/1 2>&1"]}}},"terminationMessagePath":"/dev/termination-log","terminationMessagePolicy":"File","imagePullPolicy":"IfNotPresent","securityContext":{"privileged":true}}],"restartPolicy":"Never","terminationGracePeriodSeconds":30,"dnsPolicy":"ClusterFirst","serviceAccountName":"default","serviceAccount":"default","nodeName":"gke-nonprod-us-west1-default-node-poo-504a47a4-8he7","securityContext":{},"schedulerName":"default-scheduler","tolerations":[{"key":"node.kubernetes.io/not-ready","operator":"Exists","effect":"NoExecute","tolerationSeconds":300},{"key":"node.kubernetes.io/unreachable","operator":"Exists","effect":"NoExecute","tolerationSeconds":300}],"priority":0,"enableServiceLinks":true,"preemptionPolicy":"PreemptLowerPriority"},"status":{"phase":"Pending","conditions":[{"type":"PodScheduled","status":"True","lastProbeTime":null,"lastTransitionTime":"2024-03-19T22:21:12Z"}],"qosClass":"Burstable"}}]}
2024-03-19T22:21:50Z	DEBUG	runner	Runner appears to have been registered and running.	{"runner": "actions-runner-systems/github-action-small-5r8nj-lhd8g", "podCreationTimestamp": "2024-03-19 22:21:11 +0000 UTC"}
2024-03-19T22:21:50Z	DEBUG	runner	Runner appears to have been registered and running.	{"runner": "actions-runner-systems/github-action-small-5r8nj-flq4v", "podCreationTimestamp": "2024-03-19 22:21:12 +0000 UTC"}

Runner Pod Logs

# Authentication


√ Connected to GitHub

# Runner Registration




√ Runner successfully added
√ Runner connection is good

# Runner settings


√ Settings Saved.

2024-03-19 22:13:18.165  DEBUG --- Runner successfully configured.
{
  "agentId": 88824,
  "agentName": "github-action-small-5r8nj-hv2gf",
  "poolId": 1,
  "poolName": "Default",
  "ephemeral": true,
  "serverUrl": "https://pipelinesghubeus21.actions.githubusercontent.com/tMTkzAKYleoidiHAI9FjPaHPkEkp2s7TIoUW3BW1740YmeFlFo/",
  "gitHubUrl": "https://github.com/prosperllc",
  "workFolder": "/runner/_work"
2024-03-19 22:13:18.174  DEBUG --- Docker enabled runner detected and Docker daemon wait is enabled
2024-03-19 22:13:18.177  DEBUG --- Waiting until Docker is available or the timeout of 120 seconds is reached
Failed to initialize: unable to resolve docker endpoint: open /certs/client/ca.pem: no such file or directory
Failed to initialize: unable to resolve docker endpoint: open /certs/client/ca.pem: no such file or directory
Failed to initialize: unable to resolve docker endpoint: open /certs/client/ca.pem: no such file or directory
Failed to initialize: unable to resolve docker endpoint: open /certs/client/ca.pem: no such file or directory
Cannot connect to the Docker daemon at tcp://localhost:2376. Is the docker daemon running?
Cannot connect to the Docker daemon at tcp://localhost:2376. Is the docker daemon running?
Cannot connect to the Docker daemon at tcp://localhost:2376. Is the docker daemon running?
Cannot connect to the Docker daemon at tcp://localhost:2376. Is the docker daemon running?
Cannot connect to the Docker daemon at tcp://localhost:2376. Is the docker daemon running?
Cannot connect to the Docker daemon at tcp://localhost:2376. Is the docker daemon running?
Cannot connect to the Docker daemon at tcp://localhost:2376. Is the docker daemon running?
Cannot connect to the Docker daemon at tcp://localhost:2376. Is the docker daemon running?
Cannot connect to the Docker daemon at tcp://localhost:2376. Is the docker daemon running?

sravula84 avatar Mar 19 '24 22:03 sravula84

we have 2 cluster configured with actions runners controller

  1. first cluster never had any issues - 1.25.16-gke.1460000
  2. second cluster always running with the above issue . both cluster same helm chart version used . the only difference is kubernetes version- 1.27.9-gke.1092000

is there any specific controller version need to use or runner image or any other configuration changes required ?

sravula84 avatar Mar 20 '24 17:03 sravula84

any suggestions on the above case?

sravula84 avatar Mar 25 '24 20:03 sravula84

HI Team ,

any suggestions on the above issue?

sravula84 avatar Mar 30 '24 06:03 sravula84

Hi Team any suggestions on the above issue , i see few of users raised similar issue

Thanks Sridhar

sravula84 avatar Apr 01 '24 17:04 sravula84