Running wsl.exe from within distribution appears to hang forever [WSLInterop fails]
Hello,
thanks for your efforts with genie, it is very cool!
Windows version (build number): Version 21H2 (OS Build 19044.1889)
Linux distribution:
Ubuntu 20.04 LTS installed with wsl --install -d ubuntu-20.04, but this seems to break WSLInterop for every distribution on WSL2, I think this happens because they share binfmt_misc kernel-side configuration.
Kernel version: 5.4.72-microsoft-standard-WSL2
Genie version: genie 2.4
Describe the bug I have installed genie 2.4 on Ubuntu 20.04 using the Debian package from the wsl-transdebian repository.
Before starting the bottle, WSLInterop works just fine, and I can run wsl.exe without any problems, both from the Ubuntu 20.04 distribution, and from a different, Debian distribution:
vangelis@MAGELLAN2:~$ wsl.exe --help
Copyright (c) Microsoft Corporation. All rights reserved.
Usage: wsl.exe [Argument] [Options...] [CommandLine]
[...]
The moment I start the bottle, running wsl.exe appears to hang for quite some time, until it fails, see below:
vangelis@MAGELLAN2:~$ genie -s
Waiting for systemd....!
vangelis@MAGELLAN2:~$ genie -b
inside
vangelis@MAGELLAN2:~$ /mnt/c/Windows/System32/wsl.exe -d Ubuntu-20.04
[...the command appears to hang for a long time, and sometimes fails with...]
An operation on a socket could not be performed because the system lacked sufficient buffer space or because a queue was full.
Which seems we fell into some sort of infinite loop and we ran out of some resource.
I have also confirmed this by attempting to kill wsl.exe from Windows:
C:\>taskkill /f /im wsl.exe
SUCCESS: The process "wsl.exe" with PID 30828 has been terminated.
SUCCESS: The process "wsl.exe" with PID 30808 has been terminated.
SUCCESS: The process "wsl.exe" with PID 41816 has been terminated.
[... a lot more processes killed...]
I have confirmed running wsl.exe fails both for Ubuntu 20.04 [both from inside the bottle, outside the bottle] and for Debian: WSLInterop seems to break for all distributions.
Confirm that you are running inside the bottle:
The output of genie -b.
To Reproduce Steps to reproduce the behavior: See above for steps to reproduce the issue.
I have confirmed that wsl.exe hangs only after starting the bottle, because systemd changes WSLInterop configuration via file /usr/lib/binfmt.d/WSLInterop.conf, which genie installs.
This configuration is the one before starting the bottle:
$ cat /proc/sys/fs/binfmt_misc/WSLInterop
enabled
interpreter /tools/init
flags: F
offset 0
magic 4d5a
This configuration is the one in /usr/lib/binfmt.d/WSLInterop.conf after starting the bottle:
$ cat /proc/sys/fs/binfmt_misc/WSLInterop
enabled
interpreter /init
flags: PF
offset 0
magic 4d5a
Note the difference in the P flag.
Systemd seems to enable this configuration via /usr/lib/binfmt.d/WSLInterop.conf and this service:
$ systemctl status systemd-binfmt
● systemd-binfmt.service - Set Up Additional Binary Formats
Loaded: loaded (/lib/systemd/system/systemd-binfmt.service; static; vendor preset: enabled)
Active: active (exited) since Mon 2022-08-15 11:02:57 EEST; 14min ago
Expected behavior
I provide more context below, but I have confirmed I can solve this problem, and running wsl.exe just works from anywhere by either:
- preventing systemd from configuring binfmt_msc:
systemctl mask systemd-binfmt, or - removing
/usr/lib/binfmt.d/WSLInterop.confaltogether.
Screenshots [I don't have any screenshots of this]
Additional context
I have tried to understand more, I am exposing my context below. Looking forward to your feedback.
I understand file /usr/lib/binfmt.d/WSLInterop.conf became a part of genie due to issue https://github.com/arkane-systems/genie/issues/142. Note that the original version of this file introduced the binfmt handler just with the F flag:
https://github.com/arkane-systems/genie/issues/142#issuecomment-856311929
Please note that binfmt created by systemd is not wsl-specific, thus it is not preconfigured to support windows executables like the one we unmounted. But we can easily bring the Windows interoperability back by adding config manually, which is done by creating e.g. /etc/binfmt.d/99-WSLInterop.conf with below contents:
:WSLInterop:M::MZ::/init:F
But the assertion by @esgie doesn't seem to hold. WSL configures the WSLInterop handler itself when it first starts the distribution, and no matter how many times we unmount or re-mount the binfmt_misc fs, the kernel configuration remains constant, so we don't really need to touch it at all.
Steps to show this:
-
Mask the service, or remove the offending file:
$ sudo systemctl mask systemd-binfmt $ sudo mv /usr/lib/binfmt.d{,.disabled} -
Confirm configuration before starting the bottle:
$ cat /proc/sys/fs/binfmt_misc/WSLInterop enabled interpreter /tools/init flags: F offset 0 magic 4d5a vangelis@MAGELLAN2:~$ genie -r stopped -
Confirm the same configuration after starting the bottle:
vangelis@MAGELLAN2:~$ genie -s Waiting for systemd....! vangelis@MAGELLAN2:~$ genie -b inside vangelis@MAGELLAN2:~$ cat /proc/sys/fs/binfmt_misc/WSLInterop enabled interpreter /tools/init flags: F offset 0 magic 4d5a -
Confirm
wsl.exejust works:vangelis@MAGELLAN2:~$ /mnt/c/Windows/System32/wsl.exe -d Ubuntu-20.04 hostname MAGELLAN2
I have also confirmed that actually leaving the file there but removing the problematic P flag also works, because systemd-binfmt.service becomes essentially a no-op.
However, commit https://github.com/arkane-systems/genie/commit/ebca3e323322e28bf4d24dfc881fb26e08737e59 by @cerebrate changed the flags to PF, following discussion in https://github.com/arkane-systems/genie/issues/267:
https://github.com/arkane-systems/genie/issues/267#issue-1226010903
The missing P is the problem. Changing F to PF in WSLInterop.conf would solve this issue, but I can't confirm whether other Windows version have the same situation.
This seems strange, because it seems @NyaMisty was seeing the reverse of what I am seeing, flags PF did work and F failed.
I found this discussion which is relevant:
https://github.com/microsoft/WSL/issues/8162
I understand that at some point, recently, @benhillis modified the binfmt interpreter [/init?] to support the P flag, and require it at registration:
https://github.com/microsoft/WSL/issues/8162#issuecomment-1080854675
So, it could be that @NyaMisty is running a more recent WSL2 version than me, and genie needs to support both configurations.
Given this context, and the fact that asking systemd to configure binfmt.d explicitly seems to be unnecessary, I propose genie doesn't ship /usr/lib/binfmt.d/WSLInterop.conf at all.
I am looking forward to your feedback, and would be happy to follow up with a PR, if you agree with the above conclusions.
Thanks, Vangelis.
I confirm that I have read the ENTIRE supplied readme file and checked for relevant information on the repository wiki before raising this issue, and that if the solution to this issue is found in either location, it will be closed without further comment:
- [X] Yes.
I just realized this is the same problem that @ShadowEO describes here: https://github.com/arkane-systems/genie/issues/287
The conclusion on this issue also seems to be that some WSL2 versions need F, some need PF for binfmt flags.
I think the following would work:
- Mask
systemd-binfmt.service, so it doesn't attempt to configure binfmt at all, and more importantly, it doesn't destroy its configuration when the bottle is stopped. - Do not ship
/usr/lib/binfmt.d/WSLInterop.confat all, but re-use the version-specific configuration that WSL2 applies at startup, instead.
Fixed in 2.5, shipping as soon as a psutil issue is resolved.