actions-runner-controller icon indicating copy to clipboard operation
actions-runner-controller copied to clipboard

Runner Container Spinning Up Faster than Docker Daemon can be Ready.

Open rxa313 opened this issue 3 years ago • 1 comments

ARC version: v0.24.1 Chart version: 0.18.0

Note: Using RunnerSet configuration.

I'm noticing an issue with some runners after they restart post execution. These runners are ephemeral and I've seen, sporadically, (it doesn't happen every time a runner is started), that sometimes the Runner container is getting spun up before docker daemon is ready. When this is happening if a job ends up running on that runner and trying to use docker, it's throwing a is docker daemon running error in the GitHub actions log. When I check on the runner logs it's spitting out is docker daemon running a handful of times and then says is listening for jobs but the docker container is showing red.. I'm assuming it was never resolved in that case.

I did try to do some configurations to help the runner container wait for the daemon to be up before it begins but I'm still seeing this issue come up from time to time.

What I'm using in my runner container:

  containers:
      - name: runner
        imagePullPolicy: IfNotPresent
        env:
        - name: NODE_EXTRA_CA_CERTS
          value: /usr/local/share/ca-certificates/<root.crt>
        - name: STARTUP_DELAY_IN_SECONDS
          value: "2"
        - name: DISABLE_WAIT_FOR_DOCKER
          value: "false"

Here's the start of the logs of the runner container of one of my runners:

2022-09-14 18:01:48.735  NOTICE --- Delaying startup by 2 seconds
2022-09-14 18:01:50.738  DEBUG --- Github endpoint URL https://github.com/
2022-09-14 18:01:51.344  DEBUG --- Passing --ephemeral to config.sh to enable the ephemeral runner.
2022-09-14 18:01:51.348  DEBUG --- Configuring the runner.

and after the runner is successfully configured (before connected to GitHub check):

2022-09-14 18:11:09.189  DEBUG --- Docker enabled runner detected and Docker daemon wait is enabled
2022-09-14 18:11:09.191  DEBUG --- Waiting until Docker is available or the timeout is reached

Any additional recommendation to avoid the is docker daemon running? issue?

Thanks!

rxa313 avatar Sep 14 '22 18:09 rxa313

@rxa313 Hey! Does your dockerd take more time to start than 2 minutes? We recently discovered https://github.com/actions-runner-controller/actions-runner-controller/issues/1830 which doesn't let the runner container fail when docker wait failed. We'll be updating it to fail in that case. If it takes more time than 2 minutes, how often does it happen for you? Do you need to tweak the docker wait timeout duration? It's currently hard-coded to 2 minutes so if you need it, we'd need to update the entrypoint to accept another environment variable to tweak the timeout.

mumoshu avatar Sep 23 '22 01:09 mumoshu

@mumoshu

My error is happening so sporadically I'd have to monitor the frequency to really tell. I looked at the issue you've linked and it's the same thing I'm experiencing. I think perhaps a longer timeout might help in my case to maybe like 10-30 seconds or so to just give the daemon more time to spin up. So far I haven't heard any complaints from my users since I added those parameters but I've been trying to keep an eye on it as we want optimal stability. I think giving us the ability to customize the timeout would help mitigate this issue further.

rxa313 avatar Sep 28 '22 17:09 rxa313