LXD is missing support for `binfmt_misc` namespace
On a Noble host (kernel > 6.7), creating a privileged Noble container shows the container's systemd-binfmt.service cannot start:
root@v2:~# uname -a
Linux v2 6.8.0-35-generic #35-Ubuntu SMP PREEMPT_DYNAMIC Mon May 20 15:51:52 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
root@v2:~# snap list lxd
Name Version Rev Tracking Publisher Notes
lxd git-45f8fa5 28915 latest/edge canonical✓ -
root@v2:~# lxc launch ubuntu-minimal-daily:24.04 p1 -c security.privileged=true
root@v2:~# lxc exec p1 -- journalctl -o cat -u systemd-binfmt.service
Starting systemd-binfmt.service - Set Up Additional Binary Formats...
Failed to flush binfmt_misc rules, ignoring: Permission denied
/usr/lib/binfmt.d/python3.12.conf:1: Failed to delete rule 'python3.12', ignoring: Permission denied
/usr/lib/binfmt.d/python3.12.conf:1: Failed to add binary format 'python3.12': Permission denied
systemd-binfmt.service: Main process exited, code=exited, status=1/FAILURE
systemd-binfmt.service: Failed with result 'exit-code'.
Failed to start systemd-binfmt.service - Set Up Additional Binary Formats.
root@v2:~# lxc exec p1 -- cat /proc/sys/fs/binfmt_misc/status
cat: /proc/sys/fs/binfmt_misc/status: Permission denied
root@v2:~# lxc exec p1 -- ls -l /proc/sys/fs/binfmt_misc/status
-rw-r--r-- 1 root root 0 Jun 14 17:55 /proc/sys/fs/binfmt_misc/status
This should be cherry-pick'able from Incus: https://github.com/lxc/incus/pull/474
@simondeziel want to cherry-pick this one?
I've cherry-picked https://github.com/lxc/incus/pull/474. Here is my feature branch on my fork: https://github.com/kadinsayani/lxd/tree/13607-lxd-missing-support-for-binfmt_misc-namespace. I've tried testing on my end and here is the result I'm getting:
kadinsayani@devbox:~/canonical/lxd$ uname -a
Linux devbox 6.8.0-41-generic #41-Ubuntu SMP PREEMPT_DYNAMIC Fri Aug 2 20:41:06 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
kadinsayani@devbox:~/canonical/lxd$ snap list lxd
Name Version Rev Tracking Publisher Notes
lxd git-78e006e 30157 latest/edge canonical✓ -
kadinsayani@devbox:~/canonical/lxd$ lxc launch ubuntu-minimal-daily:24.04 p1 -c security.privileged=true
Launching p1
kadinsayani@devbox:~/canonical/lxd$ lxc exec p1 -- journalctl -o cat -u systemd-binfmt.service
Starting systemd-binfmt.service - Set Up Additional Binary Formats...
systemd-binfmt.service: Main process exited, code=exited, status=1/FAILURE
systemd-binfmt.service: Failed with result 'exit-code'.
Failed to start systemd-binfmt.service - Set Up Additional Binary Formats.
kadinsayani@devbox:~/canonical/lxd$ lxc exec p1 -- cat /proc/sys/fs/binfmt_misc/status
cat: /proc/sys/fs/binfmt_misc/status: Permission denied
kadinsayani@devbox:~/canonical/lxd$ lxc exec p1 -- ls -l /proc/sys/fs/binfmt_misc/status
-rw-r--r-- 1 root root 0 Aug 31 03:24 /proc/sys/fs/binfmt_misc/status
Please note that I'm side-loading my debug binaries with the lxd snap
@simondeziel, when you have a chance, can you please test on your end? In the meantime, I'll continue with debugging.
For security.privileged=true case, a binfmt_misc virtualization won't work because binfmt_misc file system's superblock is associated with a user namespace (ref https://github.com/torvalds/linux/blob/de5cb0dcb74c294ec527eddfe5094acfdb21ff21/fs/binfmt_misc.c#L982 )
But for unprivileged containers (our main case) it should work pretty well.
@mihalicyn thanks, I now realized I my confusion, thanks for setting it straight. I've updated the bug description to use regular/unpriv container. Sorry for the confusion Kadin!
No worries Simon! Thanks for updating the description.
I think we should be good to close this issue now 👍