container-storage-setup icon indicating copy to clipboard operation
container-storage-setup copied to clipboard

Cloud-Init triggered install/start of docker will always hang

Open james-masson opened this issue 10 years ago • 7 comments

If Cloud-init is used to trigger installs of docker, docker startup, and provisioning in general will just hang. This is irrespective of what the actual method of install/config is - ie. Puppet/Ansible etc.

Expanding on that:

  • In our method, we use cloud-init to trigger Puppet or Ansible for the initial configuration run.
  • This succeeds until the docker daemon is requested to start.
  • The docker daemon service requires docker-storage-setup to have already run.
  • The docker-storage-setup systemd config requires cloud-final.service ( ie. cloud-init) to have finished - https://github.com/projectatomic/docker-storage-setup/blob/master/docker-storage-setup.service#L3
  • The daemon start process just hangs at this point, because of this circular dependency.

The result is that provisioning cannot complete until cloud-init has failed/died - or the box has been rebooted.

Commenting out After=cloud-final.service makes this problem go away.

I understand the thinking behind the requirement - basically, ensure the box has had it's storage configured by Cloud-init, to be used by docker-storage-setup - but the existing restriction makes more sophisticated hands-off provisioning awkward.

I'm not sure what to suggest as a solution here, sorry!

james-masson avatar Sep 10 '15 13:09 james-masson

Thanks for the excellent issue report. Unfortunately, the After=cloud-final.service was part of the initial commit, and there's no comment, so we are left to try to retroactively determine its reason for existence.

Offhand...I think we might have been waiting for cloud-init to do the growpart bit. But we do that internally now.

I am thinking it'd be safe to just remove that After=...but let's spend a bit of time to try to consider the repercussions.

cgwalters avatar Sep 10 '15 14:09 cgwalters

The typical way of solving this is to pass the --no-block flag to systemctl, e.g.,

systemctl --no-block start docker

This will enqueue the start request and return immediately. Of course, docker won't start until its dependencies are satisfied, so if you require docker to be running while cloud-init is still processing you're out of luck.

This is tricky if you're using the built-in "service" abstractions in ansible/puppet/etc, which may not have any facility for using the --no-block flag.

larsks avatar Sep 29 '15 17:09 larsks

--no-block can only help if you don't do anything with docker in your scripts. If you want to actually confugure or run something, then you are out of luck. Trying now:

runcmd:
...
- [ systemctl, enable, docker.service ]
- [ systemctl, start, docker-storage-setup.service, --ignore-dependencies ]
- [ systemctl, start, docker.service, --ignore-dependencies ]
...

P.S. Tested and works for me. I think docker-storate-setup can be skipped as it seems useless without setting configuration. But leaving it above for completeness. P.P.S. @smoser from #cloud-init also suggested as a possible workaround to create a systemd service using bootcmd: []. And that service would run the things that need interaction with docker after docker has launched (provided you create it properly). And systemd should pick any service created with a boothook without need to enable with a runcmd.

akostadinov avatar Oct 22 '15 20:10 akostadinov

@akostadinov

for your mentioned bootcmd: is it really helps for cases below? In cloud-init, we call systemd service, docker, right now, we use no-block.

But another issue comes for no-block, in cloud-init, we also need to run docker run *** something to install(for example to install nsenter). As docker service is scheduled to run after cloud-init, but cloud-init also have scripts want to call docker service(to start container), so this script would failed as docker not started at that time.

Do you have any good suggestion for that ?

HackToday avatar Mar 09 '16 08:03 HackToday

@HackToday , using --ignore-dependencies worked for me. but tbh switched to ansible setup eventually. If you need machine restart for example, cloud-init becomes a no-go. E.g. updating system in RPM based system does not support reboot yet.

akostadinov avatar Mar 09 '16 08:03 akostadinov

Actually, now that docker-storage-setup.service no longer depends on cloud-final.service, this should no longer be an issue (see #161). This works for me on Fedora 25:

runcmd:
  - systemctl start docker
  - docker pull busybox
  - docker run -d busybox sleep 999

If you want to change the default d-s-s configuration, you can still do so in the bootcmd section.

jlebon avatar Nov 24 '16 17:11 jlebon

Even better, since cloud-final.service has an After on multi-user.target, you don't even have to do systemctl start docker first in the above.

jlebon avatar Nov 24 '16 17:11 jlebon