runner icon indicating copy to clipboard operation
runner copied to clipboard

Windows runner sometimes get stuck at start

Open davhdavh opened this issue 1 year ago • 1 comments

Describe the bug I have configured my runners to use a local proxy (so that no runners can access anything inside the cluster)

However, every 100 or so runner pod starts I get this error:

2024-07-10 02:04:06Z: Runner connect error: No such host is known. (proxy.kube-public.svc.cluster.local:8888). Retrying until reconnected.

Single log line only, but from there it is totally stuck and never able to run any job.

To Reproduce Steps to reproduce the behavior:

  1. Unknown

I am not sure if any retry at all happens, but this error is caused by windows sucking balls and starting without DNS working... That's windows and/or calico for windows for you, and only restarting the pod will ever fix it for you.

Expected behavior

It retried, and eventually gave up forcing the runner to reschedule another runner.

Runner Version and Platform

gha-runner-scale-set-controller:0.9.1

ENV GITHUB_RUNNER_VERSION=v2.317.0
...
CMD [ "powershell", "-c", "./config.cmd --name $env:RUNNER_NAME --url https://github.com/$env:RUNNER_REPO --token $env:RUNNER_TOKEN --labels $env:RUNNER_LABELS --unattended --replace --ephemeral; ./run.cmd"]

OS of the machine running the runner? Windows

What's not working?

Job Log Output

NA

Runner and Worker's Diagnostic Logs

2024-07-10 02:04:06Z: Runner connect error: No such host is known. (proxy.kube-public.svc.cluster.local:8888). Retrying until reconnected.

davhdavh avatar Jul 12 '24 10:07 davhdavh

Still have to fix this problem manually 👎

davhdavh avatar Mar 27 '25 12:03 davhdavh