ci icon indicating copy to clipboard operation
ci copied to clipboard

Random "Is the docker daemon running?" with Docker-in-Docker Feature

Open metaskills opened this issue 2 years ago • 15 comments

When using the docker in docker feature it has a 20% chance to fail. I created a demo repo to show case this. It uses concurrent jobs to highlight the issue well but is not limited to this workflow style. I am seeing this random failure behavior all over certain project. HELP PLEASE!

  • https://github.com/customink/dnd-demo (repo)
  • https://github.com/customink/dnd-demo/actions/runs/3981703221 (failures)

failure

If this issue is within the CLI, then I have created an issue there in that project to track it as well:

  • https://github.com/devcontainers/cli/issues/383

metaskills avatar Jan 22 '23 22:01 metaskills

@Chuxel Any thoughts on this?

metaskills avatar Jan 23 '23 21:01 metaskills

Hmmm. If you add cat /tmp/dockerd.log from your exec, that would output the startup logs. Since docker is started in the background, my bet is that things are going fast enough sometimes that the exec happens before it is fully up. Otherwise there would be errors in that file that could point to the underling issue.

Adding a sleep statement in the exec might also verify whether this is a race condition.

Chuxel avatar Jan 23 '23 21:01 Chuxel

Seems Docker does not start at all. Also, when this happens there is no amount of waiting I can do in the devcontainer. Docker will just not work. I tried waiting for several minutes.

cat: /tmp/dockerd.log: No such file or directory

metaskills avatar Jan 23 '23 22:01 metaskills

Looking at https://github.com/customink/dnd-demo/actions/runs/3990966349/jobs/6845376920#step:3:689, this issue sounds quite similar to https://github.com/devcontainers/features/issues/372

Looks like this issue mostly occurs in Action runners & not in a Codespace.

samruddhikhandale avatar Jan 23 '23 22:01 samruddhikhandale

@metaskills Even the other issue I pointed at, uses the runs-on: ubuntu-latest image in the workflow. @metaskills Can we change the image and see if that helps?

samruddhikhandale avatar Jan 23 '23 22:01 samruddhikhandale

Sure. I'll change it to a few other things and even see if the version of the CI helps. Will report back shortly.

metaskills avatar Jan 23 '23 23:01 metaskills

So I tested ubuntu-20.04 and after about 50 runs I've had no failures. So that is good news and gives me something to work with while we sort this out. I'll read that other issue too.

metaskills avatar Jan 24 '23 01:01 metaskills

Very strange. I'd also be curious if running the /usr/local/share/docker-init.sh script again during your exec fixes it. We could layer in some retry if it does. But it's super odd that it's not consistent... almost like its an issue with certain Actions hosts. @samruddhikhandale - Might be worth reaching out to the actions folks to see if anything has been happening?

Chuxel avatar Jan 24 '23 15:01 Chuxel

if running the /usr/local/share/docker-init.sh script again during your exec

Do you mean in my runCmd?

metaskills avatar Jan 24 '23 16:01 metaskills

We've had internal reports of this as well with Debian 11.

joshaber avatar Jan 24 '23 16:01 joshaber

if running the /usr/local/share/docker-init.sh script again during your exec

Do you mean in my runCmd?

Yes, sorry. (Under the hood its devcontainer exec.)

Chuxel avatar Jan 24 '23 17:01 Chuxel

Very strange. I'd also be curious if running the /usr/local/share/docker-init.sh script again during your exec fixes it. We could layer in some retry if it does. But it's super odd that it's not consistent... almost like its an issue with certain Actions hosts. @samruddhikhandale - Might be worth reaching out to the actions folks to see if anything has been happening?

Created https://github.com/actions/runner-images/issues/6980

samruddhikhandale avatar Jan 24 '23 18:01 samruddhikhandale

if running the /usr/local/share/docker-init.sh script again during your exec

@Chuxel Tried that... did not help. The message is still the same when I do this.

          runCmd: |
            /usr/local/share/docker-init.sh
            docker info

metaskills avatar Jan 24 '23 20:01 metaskills

The user "runner" used on runner-images is a member of a "docker" group, so you shouldn't expect such problems. 
However to understand the nature of the problem, could you please run the docker-in-docker task without using "devcontainers/[email protected]" action?
We would like to make sure that the root cause is not the action itself.

Originally posted by @Alexey-Ayupov in https://github.com/actions/runner-images/issues/6980#issuecomment-1403631708

@metaskills Would you be interested to test this hypothesis? Thanks!

samruddhikhandale avatar Jan 25 '23 19:01 samruddhikhandale

Thanks, I'm subscribed to that issue too so I replied there.

metaskills avatar Jan 25 '23 21:01 metaskills