Cloud-init aborted on WSL
Bug report
I have previously reported this against WSL as I think this is an issue that's not caused by cloud-init itself, so I'm linking this here for reference only.
Summary
When initializing Ubuntu-2404 in WSL, some kind of 'watchdog' kicks in after 10000ms and aborts cloud-init about 15s later, usually during the final phase.
Cloud-init works for quick tasks, but when adding multiple packages, apt-get is simply terminated and not re-run later.
For details and logs, see https://github.com/microsoft/WSL/issues/11602
Thank you for filing this bug and improving cloud-init. CC: @CarlosNihelton in case there is a default wsl.conf setting we may need to set up if we see "complex" user-data.
Very interesting report. My initial thoughts are that we need to deal with the WSL side of the things first, as there is no logic in cloud-init itself that would prevent that shutdown per my understanding after vewing the logs reported in the other bug report.
Thanks for the good insights @thielj . I'll investigate this further.
I suggested to WSL that they check cloud-init status before shutting down, as documented here: https://cloudinit.readthedocs.io/en/latest/howto/status.html
@blackboxsw Not sure what your definition of complex user data is. Mine surely doesn't look complex, but some of the packages pull in a whole lot of other stuff. Once a distro has aged a bit, package_upgrade: true on a slow network might not complete.
@CarlosNihelton I've described a workaround over on the WSL issue, which showed an additional problem with /etc/resolv.conf disappearing and package upgrades failing. I've added a workaround for that, too.
https://github.com/microsoft/WSL/issues/11602#issuecomment-2140755462
Which makes me wonder: Is using cloud-init on WSL actually a supported and tested scenario, or more like a proof of concept?
What's been your experience with the two-step installation workflow described here: https://canonical-ubuntu-wsl.readthedocs-hosted.com/en/latest/tutorials/cloud-init/#enjoy ?
Have you been as impacted by the timeouts and surprising behaviors of WSL?
@CarlosNihelton Damn, I should have seen this document before - it looks almost identical to the workarounds I came up with.
I'll need to try if the --no-launch and --root flags might fix the issue with disappearing /etc/resolv.conf when I'm back on my dev machine. Are these even in the official WSL docs?
Those arbitrarily chosen timeouts are pretty annoying. In particular, if the distro is powered off and the VHD left in a corrupted state.
I understand how the various timers are intended to benefit someone who runs a single app for a couple minutes and needs to reclaim all the memory ASAP.
For a developer machine with 64GB of RAM, dedicating some of that to WSL is a non-issue though. Considering the various workarounds I've seen - just to keep a distro up and running - others must be feeling the same. Chances are I'll just be going back to using a second physical machine or full VMs.
Can't they just put a penguin 🐧 icon in the bottom right corner that allows me to control this and other WSL settings? Like power toys does for keeping the machine awake when necessary.
Are these even in the official WSL docs?
Cloud-init support is not an upstream WSL feature, but rather specific to Ubuntu (by now, at least, as nothing in the implementation prevents other distros to benefit from it), thus those are Ubuntu docs, rather than MS docs.
@CarlosNihelton Setting this to bug: external since the issue does not seem to lie with cloud-init.
It might help to (a) emphasize that running ubuntu2404.exe run cloud-init status --wait immediately after installation has completed is not just about watching cloud-init do its job - it's rather existential for cloud-init to complete anything more than a quick useradd. Even an idle shell would be enough for the WSL 10000ms watchdog timer to consider the distro to be in use.
And (b), may I suggest linking these instructions from the official cloud-init docs / the WSL source, and maybe try to get them added or linked to Microsoft's WSL documentation as well?
It might help to (a) emphasize that running
ubuntu2404.exe run cloud-init status --waitimmediately after installation has completed is not just about watching cloud-init do its job - it's rather existential for cloud-init to complete anything more than a quick useradd. Even an idle shell would be enough for the WSL 10000ms watchdog timer to consider the distro to be in use.
I did an interesting investigation on this topic and got the conclusion that adding such a call into the Ubuntu distro launcher executable itself is enough (if the particular distro instance being launched had cloud-init ofc). Users won't have to worry about this pretty soon.
Hey @thielj you might like the latest release of Ubuntu 24.04 LTS. The distro launcher now "protects" cloud-init runs, so it's use is more transparent. We also updated the documentation (which is now in a more discoverable place).
https://documentation.ubuntu.com/wsl/en/latest/tutorials/cloud-init/
I think we can safely close this issue.
Thanks, @CarlosNihelton, closing accordingly.