ArchWSL icon indicating copy to clipboard operation
ArchWSL copied to clipboard

systemctl initializing

Open CoachYT1 opened this issue 1 year ago • 23 comments

Describe the issue After updating to latest ArchWSL systemctl is not working. systemctl status shows initializing

To Reproduce Update to latest ArchWSL and make a clean installation.

Expected behavior systemctl should start normally

Screenshots image

Enviroment:

  • Windows build number: 10.0.22631.3155
  • Security Software: Malwarebytes Premium
  • WSL version 1/2: WSL 2
  • ArchWSL version: 24.3.11.0
  • ArchWSL Installer type: zip
  • Launcher version: 23072600

CoachYT1 avatar Mar 14 '24 18:03 CoachYT1

This might be related to the Systemd announcement that they are dropping support for cgroups v1 "in a release after 2023" (ref). It's currently working in my Arch WSL environment but I explicitly disabled cgroups v1 support inside of WSL.

You can try this yourself and see if it helps:

  • wsl --shutdown to terminate all running WSL instances
  • Add a %USERPROFILE%\.wslconfig file (or edit it if it already exists) and make sure that it contains:
[wsl2]
kernelCommandLine = cgroup_no_v1=all
  • Wait 10 seconds or so, then restart your Arch WSL.

9numbernine9 avatar Mar 19 '24 17:03 9numbernine9

This might be related to the Systemd announcement that they are dropping support for cgroups v1 "in a release after 2023" (ref). It's currently working in my Arch WSL environment but I explicitly disabled cgroups v1 support inside of WSL.

You can try this yourself and see if it helps:

* `wsl --shutdown` to terminate all running WSL instances

* Add a `%USERPROFILE%\.wslconfig` file (or edit it if it already exists) and make sure that it contains:
[wsl2]
kernelCommandLine = cgroup_no_v1=all
* Wait 10 seconds or so, then restart your Arch WSL.

image

Same

CoachYT1 avatar Mar 19 '24 18:03 CoachYT1

I have the same problem, and it does not work for me either 😰 The only thing changed is that Tainted: cgroupsv1 has gone

xuangeyouneihan avatar Mar 21 '24 08:03 xuangeyouneihan

Well, I found this and modified .wslconfig according to it, then it worked. But when I renamed .wslconfig to .wslconfig1 without modifying it to enable cgroups v1, Systemd was also working somehow. Then I tried to rename .wslconfig1 back without modifying it to disable cgroups v1, backup the origional ext4.vhdx, unregister ArchWSL, and then re-install it with a new ext4.vhdx, Systemd did not work again. Finally I deleted .wslconfig, and replaced the new ext4.vhdx with the old one, and Systemd works. So why did it work in my old ext4.vhdx, and why didn't it work in a new ext4.vhdx?

xuangeyouneihan avatar Mar 25 '24 15:03 xuangeyouneihan

I'm running into this issue as well when setting up an ArchWSL instance on a brand new Windows 10 installation (despite my earlier comments about potential workaround/solutions).

Trying to narrow this down a bit further, I started going back through ArchWSL relesases:

What's odd is that it works fine with the last 2022 release - and not only that, I can bring all the packages up-to-date with pacman -Syu and everything still works fine. I don't know a lot about how WSL distributions are created, but it's something that's changed in the initial configuration/bootstrapping processes between those releases?

C:\> wsl --version
WSL version: 2.1.5.0
Kernel version: 5.15.146.1-2
WSLg version: 1.0.60
MSRDC version: 1.2.5105
Direct3D version: 1.611.1-81528511
DXCore version: 10.0.25131.1002-220531-1700.rs-onecore-base2-hyp
Windows version: 10.0.19045.4170

9numbernine9 avatar Mar 27 '24 16:03 9numbernine9

I'm running into this issue as well when setting up an ArchWSL instance on a brand new Windows 10 installation (despite my earlier comments about potential workaround/solutions).

Trying to narrow this down a bit further, I started going back through ArchWSL relesases:

* [24.3.11.0](https://github.com/yuk7/ArchWSL/releases/tag/24.3.11.0) ❌

* [24.2.24.0](https://github.com/yuk7/ArchWSL/releases/tag/24.2.24.0) ❌

* [22.10.16.0](https://github.com/yuk7/ArchWSL/releases/tag/22.10.16.0) ✔️

What's odd is that it works fine with the last 2022 release - and not only that, I can bring all the packages up-to-date with pacman -Syu and everything still works fine. I don't know a lot about how WSL distributions are created, but it's something that's changed in the initial configuration/bootstrapping processes between those releases?

C:\> wsl --version
WSL version: 2.1.5.0
Kernel version: 5.15.146.1-2
WSLg version: 1.0.60
MSRDC version: 1.2.5105
Direct3D version: 1.611.1-81528511
DXCore version: 10.0.25131.1002-220531-1700.rs-onecore-base2-hyp
Windows version: 10.0.19045.4170

Does wayland-0 exist in /run/user/$UID with version 22.10.16.0? I found that wayland-0 is missing in version 24.3.11.0 when Systemd accidentally enabled, see #357

xuangeyouneihan avatar Mar 27 '24 17:03 xuangeyouneihan

Does wayland-0 exist in /run/user/$UID with version 22.10.16.0? I found that wayland-0 is missing in version 24.3.11.0 when Systemd accidentally enabled, see #357

No, it doesn't.

9numbernine9 avatar Mar 27 '24 17:03 9numbernine9

I manually built a rootfs with docker, everything works well. I think this problem just in the repo's release. My build script(built with China pacman mirror) create-rootfs.sh user-dbus-wayland-x11 user-systemctl-status system-systemctl-status

rayae avatar Mar 31 '24 12:03 rayae

None of these solutions work on my side. Only rolling back to version 22.10.16.0 works.

I'm using version 24.3.31.0 on Windows 11.

wsl --version

WSL version: 2.1.5.0
Kernel version: 5.15.146.1-2
WSLg version: 1.0.60
MSRDC version: 1.2.5105
Direct3D version: 1.611.1-81528511
DXCore version: 10.0.25131.1002-220531-1700.rs-onecore-base2-hyp
Windows version: 10.0.22631.3374

mrcaidev avatar Mar 31 '24 14:03 mrcaidev

Does Systemd work normally in v24.3.31.0 released yesterday?

xuangeyouneihan avatar Apr 01 '24 01:04 xuangeyouneihan

@xuangeyouneihan Sorry, nothing has changed on that front in that release

yuk7 avatar Apr 01 '24 01:04 yuk7

@xuangeyouneihan Sorry, nothing has changed on that front in that release

Hope this will be fixed soon 😂 BTW, do you have any idea on what caused this issue?

xuangeyouneihan avatar Apr 01 '24 02:04 xuangeyouneihan

Does Systemd work normally in v24.3.31.0 released yesterday?

v24.3.31.0 still not work for my environment.

WSL Version: 2.2.1.0
Kernel Version: 5.15.150.1-2
WSLg Version: 1.0.60
MSRDC Version: 1.2.5105
Direct3D Version: 1.611.1-81528511
DXCore Version: 10.0.25131.1002-220531-1700.rs-onecore-base2-hyp
Windows Version: 10.0.22635.3420

WH-2099 avatar Apr 01 '24 07:04 WH-2099

After testing a combination of Arch.exe wsldl.exe rootfs.tar.gz. I now suspect that the problem is mainly related to rootfs.tar.gz and most likely to the systemd-firstboot.service service. I'm continuing to troubleshoot the problem.

WH-2099 avatar Apr 05 '24 04:04 WH-2099

I think I found the immediate cause and temporary solution, but the deeper root cause is still up for debate. The systemd boot process with systemd-firstboot.service stuck is the direct cause.

The treatment is simple:

  1. systemctl list-jobs | grep 'systemd-fisrtboot.service' Get the job-id corresponding to systemd-firstboot.service (its status should be running).
  2. systemctl cancel <job-id> cancel the job

After that systemd will run normally, even if you restart wsl.


Based on my testing and extrapolation, there are two known issues:

  1. systemd-fisrtboot.service is not executing properly (don't really know much about this, but from the timeline I suspect it's related to wslg)
  2. The systemd compatibility layer in the WSL2 kernel has some problems in determining the first boot. a. Neither systemd.firstboot=false nor systemd.condition-first-boot=false prevented systemd-firstboot.service from booting by rewriting the kernel command line arguments. In fact, based on the results of systemd-analyze condition 'ConditonFirstBoot=true', the kernel doesn't seem to be handling the relevant parameters correctly.

Also, according to the official systemd documentation, I recommend removing /etc/machine-id from rootfs.tar.gz in the distribution.

For operating system images which are created once and used on multiple machines, for example for containers or in the cloud, /etc/machine-id should be either missing or an empty file in the generic file system image (the difference between the two options is described under "First Boot Semantics" below). An ID will be generated during boot and saved to this file if possible.


The information I refer to is as follows: https://www.freedesktop.org/software/systemd/man/latest/systemd-firstboot.html https://www.freedesktop.org/software/systemd/man/latest/machine-id.html https://www.freedesktop.org/software/systemd/man/latest/systemd.special.html https://www.freedesktop.org/software/systemd/man/latest/kernel-command-line.html https://learn.microsoft.com/en-us/windows/wsl/systemd

WH-2099 avatar Apr 05 '24 06:04 WH-2099

image In my case also systemd-networkd-wait-online.service was blocking the systemd boot process.

CoachYT1 avatar Apr 05 '24 06:04 CoachYT1

I think I found the immediate cause and temporary solution, but the deeper root cause is still up for debate. The systemd boot process with systemd-fisrtboot.service stuck is the direct cause.

The treatment is simple:

  1. systemctl list-jobs | grep 'systemd-fisrtboot.service' Get the job-id corresponding to systemd-firstboot.service (its status should be running).
  2. systemctl cancel <job-id> cancel the job

After that systemd will run normally, even if you restart wsl.

Based on my testing and extrapolation, there are two known issues:

  1. systemd-fisrtboot.service is not executing properly (don't really know much about this, but from the timeline I suspect it's related to wslg)
  2. The systemd compatibility layer in the WSL2 kernel has some problems in determining the first boot. a. Neither systemd.firstboot=false nor systemd.condition-first-boot=false prevented systemd-firstboot.service from booting by rewriting the kernel command line arguments. In fact, based on the results of systemd-analyze condition 'ConditonFirstBoot=true', the kernel doesn't seem to be handling the relevant parameters correctly.

Also, according to the official systemd documentation, I recommend removing /etc/machine-id from rootfs.tar.gz in the distribution.

For operating system images which are created once and used on multiple machines, for example for containers or in the cloud, /etc/machine-id should be either missing or an empty file in the generic file system image (the difference between the two options is described under "First Boot Semantics" below). An ID will be generated during boot and saved to this file if possible.

The information I refer to is as follows: https://www.freedesktop.org/software/systemd/man/latest/systemd-firstboot.html https://www.freedesktop.org/software/systemd/man/latest/machine-id.html https://www.freedesktop.org/software/systemd/man/latest/systemd.special.html https://www.freedesktop.org/software/systemd/man/latest/kernel-command-line.html https://learn.microsoft.com/en-us/windows/wsl/systemd

Spelling error should be 'firstboot' instead of 'fisrtboot'

This is how I fix this issue:

  1. Cancel running jobs like systemd-firstboot.service
  2. Disable systemd-networkd-wait-online.service
sudo systemctl list-jobs | grep running
sudo systemctl cancel <job-number>
sudo systemctl disable systemd-networkd-wait-online

image

As I tested, remove /etc/machine-id from rootfs.tar.gz would not fix this issue.

wswind avatar Apr 06 '24 07:04 wswind

I manually built a rootfs with docker, everything works well. I think this problem just in the repo's release. My build script(built with China pacman mirror) create-rootfs.sh user-dbus-wayland-x11 user-systemctl-status system-systemctl-status

@rayae Thank you for your script, it's very useful.

CnsMaple avatar Apr 15 '24 01:04 CnsMaple

This is how I fix this issue:

  1. Cancel running jobs like systemd-firstboot.service
  2. Disable systemd-networkd-wait-online.service
sudo systemctl list-jobs | grep running
sudo systemctl cancel <job-number>
sudo systemctl disable systemd-networkd-wait-online

This fixed my problem. I'm using v24.4.28.0.

mrcaidev avatar May 17 '24 15:05 mrcaidev

This might be related to the Systemd announcement that they are dropping support for cgroups v1 "in a release after 2023" (ref). It's currently working in my Arch WSL environment but I explicitly disabled cgroups v1 support inside of WSL.

You can try this yourself and see if it helps:

* `wsl --shutdown` to terminate all running WSL instances

* Add a `%USERPROFILE%\.wslconfig` file (or edit it if it already exists) and make sure that it contains:
[wsl2]
kernelCommandLine = cgroup_no_v1=all
* Wait 10 seconds or so, then restart your Arch WSL.

I had an issue with a very long wsl boot and systemd not starting right away (with the infamous Failed to connect to bus: No such file or directory), I had to wait 30s and manually run sudo systemctl start user@1000 every time to get systemd back. Your solution worked for me, it now back to what it was before, it's fast again and working, thanks!

This is how I fix this issue:

  1. Cancel running jobs like systemd-firstboot.service
  2. Disable systemd-networkd-wait-online.service
sudo systemctl list-jobs | grep running
sudo systemctl cancel <job-number>
sudo systemctl disable systemd-networkd-wait-online

I also had to do this to get Docker working again. Thanks!

shanoor avatar Jul 10 '24 12:07 shanoor

I think I found the immediate cause and temporary solution, but the deeper root cause is still up for debate. The systemd boot process with systemd-fisrtboot.service stuck is the direct cause. The treatment is simple:

  1. systemctl list-jobs | grep 'systemd-fisrtboot.service' Get the job-id corresponding to systemd-firstboot.service (its status should be running).
  2. systemctl cancel <job-id> cancel the job

After that systemd will run normally, even if you restart wsl. Based on my testing and extrapolation, there are two known issues:

  1. systemd-fisrtboot.service is not executing properly (don't really know much about this, but from the timeline I suspect it's related to wslg)
  2. The systemd compatibility layer in the WSL2 kernel has some problems in determining the first boot. a. Neither systemd.firstboot=false nor systemd.condition-first-boot=false prevented systemd-firstboot.service from booting by rewriting the kernel command line arguments. In fact, based on the results of systemd-analyze condition 'ConditonFirstBoot=true', the kernel doesn't seem to be handling the relevant parameters correctly.

Also, according to the official systemd documentation, I recommend removing /etc/machine-id from rootfs.tar.gz in the distribution.

For operating system images which are created once and used on multiple machines, for example for containers or in the cloud, /etc/machine-id should be either missing or an empty file in the generic file system image (the difference between the two options is described under "First Boot Semantics" below). An ID will be generated during boot and saved to this file if possible.

The information I refer to is as follows: https://www.freedesktop.org/software/systemd/man/latest/systemd-firstboot.html https://www.freedesktop.org/software/systemd/man/latest/machine-id.html https://www.freedesktop.org/software/systemd/man/latest/systemd.special.html https://www.freedesktop.org/software/systemd/man/latest/kernel-command-line.html https://learn.microsoft.com/en-us/windows/wsl/systemd

Spelling error should be 'firstboot' instead of 'fisrtboot'

This is how I fix this issue:

1. Cancel running jobs like systemd-firstboot.service

2. Disable systemd-networkd-wait-online.service
sudo systemctl list-jobs | grep running
sudo systemctl cancel <job-number>
sudo systemctl disable systemd-networkd-wait-online

image

As I tested, remove /etc/machine-id from rootfs.tar.gz would not fix this issue.

thx

WH-2099 avatar Jul 19 '24 08:07 WH-2099

I'm using v24.4.28.0.

modify ExecStart in /usr/lib/systemd/system/systemd-networkd-wait-online.service.

The new ExecStart should be: ExecStart=/usr/lib/systemd/systemd-networkd-wait-online -i eth0 --any --timeout=10

restart WSL: wsl --shutdown

check again: systemctl status

l3n4QAQ avatar Aug 07 '24 06:08 l3n4QAQ

Proper workaround here: https://github.com/microsoft/WSL/issues/11857

kloon15 avatar Aug 12 '24 10:08 kloon15