cloud-init icon indicating copy to clipboard operation
cloud-init copied to clipboard

Cloud-init aborted on WSL

Open thielj opened this issue 1 year ago • 10 comments

Bug report

I have previously reported this against WSL as I think this is an issue that's not caused by cloud-init itself, so I'm linking this here for reference only.

Summary

When initializing Ubuntu-2404 in WSL, some kind of 'watchdog' kicks in after 10000ms and aborts cloud-init about 15s later, usually during the final phase.

Cloud-init works for quick tasks, but when adding multiple packages, apt-get is simply terminated and not re-run later.

For details and logs, see https://github.com/microsoft/WSL/issues/11602

thielj avatar May 25 '24 16:05 thielj

Thank you for filing this bug and improving cloud-init. CC: @CarlosNihelton in case there is a default wsl.conf setting we may need to set up if we see "complex" user-data.

blackboxsw avatar May 28 '24 02:05 blackboxsw

Very interesting report. My initial thoughts are that we need to deal with the WSL side of the things first, as there is no logic in cloud-init itself that would prevent that shutdown per my understanding after vewing the logs reported in the other bug report.

Thanks for the good insights @thielj . I'll investigate this further.

CarlosNihelton avatar May 28 '24 12:05 CarlosNihelton

I suggested to WSL that they check cloud-init status before shutting down, as documented here: https://cloudinit.readthedocs.io/en/latest/howto/status.html

@blackboxsw Not sure what your definition of complex user data is. Mine surely doesn't look complex, but some of the packages pull in a whole lot of other stuff. Once a distro has aged a bit, package_upgrade: true on a slow network might not complete.

thielj avatar May 28 '24 13:05 thielj

@CarlosNihelton I've described a workaround over on the WSL issue, which showed an additional problem with /etc/resolv.conf disappearing and package upgrades failing. I've added a workaround for that, too.

https://github.com/microsoft/WSL/issues/11602#issuecomment-2140755462

Which makes me wonder: Is using cloud-init on WSL actually a supported and tested scenario, or more like a proof of concept?

thielj avatar May 30 '24 19:05 thielj

What's been your experience with the two-step installation workflow described here: https://canonical-ubuntu-wsl.readthedocs-hosted.com/en/latest/tutorials/cloud-init/#enjoy ?

Have you been as impacted by the timeouts and surprising behaviors of WSL?

CarlosNihelton avatar May 31 '24 01:05 CarlosNihelton

@CarlosNihelton Damn, I should have seen this document before - it looks almost identical to the workarounds I came up with.

I'll need to try if the --no-launch and --root flags might fix the issue with disappearing /etc/resolv.conf when I'm back on my dev machine. Are these even in the official WSL docs?

Those arbitrarily chosen timeouts are pretty annoying. In particular, if the distro is powered off and the VHD left in a corrupted state.

I understand how the various timers are intended to benefit someone who runs a single app for a couple minutes and needs to reclaim all the memory ASAP.

For a developer machine with 64GB of RAM, dedicating some of that to WSL is a non-issue though. Considering the various workarounds I've seen - just to keep a distro up and running - others must be feeling the same. Chances are I'll just be going back to using a second physical machine or full VMs.

Can't they just put a penguin 🐧 icon in the bottom right corner that allows me to control this and other WSL settings? Like power toys does for keeping the machine awake when necessary.

thielj avatar May 31 '24 09:05 thielj

Are these even in the official WSL docs?

Cloud-init support is not an upstream WSL feature, but rather specific to Ubuntu (by now, at least, as nothing in the implementation prevents other distros to benefit from it), thus those are Ubuntu docs, rather than MS docs.

CarlosNihelton avatar May 31 '24 11:05 CarlosNihelton

@CarlosNihelton Setting this to bug: external since the issue does not seem to lie with cloud-init.

a-dubs avatar Jun 24 '24 21:06 a-dubs

It might help to (a) emphasize that running ubuntu2404.exe run cloud-init status --wait immediately after installation has completed is not just about watching cloud-init do its job - it's rather existential for cloud-init to complete anything more than a quick useradd. Even an idle shell would be enough for the WSL 10000ms watchdog timer to consider the distro to be in use.

And (b), may I suggest linking these instructions from the official cloud-init docs / the WSL source, and maybe try to get them added or linked to Microsoft's WSL documentation as well?

thielj avatar Jun 24 '24 22:06 thielj

It might help to (a) emphasize that running ubuntu2404.exe run cloud-init status --wait immediately after installation has completed is not just about watching cloud-init do its job - it's rather existential for cloud-init to complete anything more than a quick useradd. Even an idle shell would be enough for the WSL 10000ms watchdog timer to consider the distro to be in use.

I did an interesting investigation on this topic and got the conclusion that adding such a call into the Ubuntu distro launcher executable itself is enough (if the particular distro instance being launched had cloud-init ofc). Users won't have to worry about this pretty soon.

CarlosNihelton avatar Jun 25 '24 21:06 CarlosNihelton

Hey @thielj you might like the latest release of Ubuntu 24.04 LTS. The distro launcher now "protects" cloud-init runs, so it's use is more transparent. We also updated the documentation (which is now in a more discoverable place).

https://documentation.ubuntu.com/wsl/en/latest/tutorials/cloud-init/

CarlosNihelton avatar Oct 07 '24 12:10 CarlosNihelton

I think we can safely close this issue.

CarlosNihelton avatar Oct 07 '24 12:10 CarlosNihelton

Thanks, @CarlosNihelton, closing accordingly.

aciba90 avatar Oct 07 '24 13:10 aciba90