Windows runner sometimes get stuck at start
Describe the bug I have configured my runners to use a local proxy (so that no runners can access anything inside the cluster)
However, every 100 or so runner pod starts I get this error:
2024-07-10 02:04:06Z: Runner connect error: No such host is known. (proxy.kube-public.svc.cluster.local:8888). Retrying until reconnected.
Single log line only, but from there it is totally stuck and never able to run any job.
To Reproduce Steps to reproduce the behavior:
- Unknown
I am not sure if any retry at all happens, but this error is caused by windows sucking balls and starting without DNS working... That's windows and/or calico for windows for you, and only restarting the pod will ever fix it for you.
Expected behavior
It retried, and eventually gave up forcing the runner to reschedule another runner.
Runner Version and Platform
gha-runner-scale-set-controller:0.9.1
ENV GITHUB_RUNNER_VERSION=v2.317.0
...
CMD [ "powershell", "-c", "./config.cmd --name $env:RUNNER_NAME --url https://github.com/$env:RUNNER_REPO --token $env:RUNNER_TOKEN --labels $env:RUNNER_LABELS --unattended --replace --ephemeral; ./run.cmd"]
OS of the machine running the runner? Windows
What's not working?
Job Log Output
NA
Runner and Worker's Diagnostic Logs
2024-07-10 02:04:06Z: Runner connect error: No such host is known. (proxy.kube-public.svc.cluster.local:8888). Retrying until reconnected.
Still have to fix this problem manually 👎