nspawn, apparmor, incus: /sys/full broken due to apparmor rule
systemd version the issue has been seen with
xg6f0c5pchmc2jq84s4np19j1rnn90mn-systemd-256.6
Used distribution
Nixos 24.11pre693653.5785b6bb5eaa (Vicuna)
Linux kernel version used
6.8.0-47-generic
CPU architectures issue was seen on
x86_64
Component
systemd-nspawn
Expected behaviour you didn't see
Starting an nspawn container in a incus container with security.privileged=true and security.nesting=true just works as expected
Unexpected behaviour you saw
When starting an nspawn container nspawn tries to mkdir /sys/full
This results in permission denied due to the apparmor rule:
Oct 19 13:25:16 r3-website container dokuwiki[711]: [pid 723] mkdir("/sys/full", 0755) = -1 EACCES (Permission denied)
Oct 19 13:25:16 r3-website container dokuwiki[711]: [pid 720] <... getsockopt resumed>[0], [4]) = 0
Oct 19 13:25:16 r3-website container dokuwiki[711]: [pid 720] setsockopt(15, SOL_NETLINK, NETLINK_EXT_ACK, [1], 4 <unfinished ...>
Oct 19 13:25:16 r3-website container dokuwiki[711]: [pid 723] openat(AT_FDCWD, "/sys/full", O_RDONLY|O_NOFOLLOW|O_CLOEXEC|O_PATH <unfinished ...>
This isn't caught at all as nspawn-mount.c just assumes the mkdir will work: https://github.com/systemd/systemd/blob/main/src/nspawn/nspawn-mount.c#L474-L478
This leads to a mysterious error: Failed to mount sysfs (type sysfs) on /sys/full (MS_RDONLY|MS_NOSUID|MS_NODEV|MS_NOEXEC ""): No such file or directory
(this could be caught at the mkdir step)
Additionally /sys/full may not be the right choice of directory for re-mounting sys (?) which would avoid this issue alltogether due to not matching the apparmor rule
Steps to reproduce the problem
For full reproduction you need ubuntu with incus (and incus must have apparmor enabled) And run the following incus commands:
- incus launch images:nixos/unstable repronspawn -c security.privileged=true -c security.nesting=true
- incus exec repronspawn bash
- systemd-nspawn --keep-unit -M dokuwiki -D /tmp --private-network --network-veth --notify-ready=yes --kill-signal=SIGRTMIN+3 --bind-ro=/nix/store --bind-ro=/nix/var/nix/db --bind-ro=/nix/var/nix/daemon-socket --link-journal=try-guest
This should trigger, among others, the Failed to mount sysfs (type sysfs) on /sys/full (MS_RDONLY|MS_NOSUID|MS_NODEV|MS_NOEXEC ""): No such file or directory error
Additional program output to the terminal or log subsystem illustrating the issue
SYSTEMD_LOG_LEVEL=debug systemd-nspawn --keep-unit -M dokuwiki -D /tmp --private-network --network-veth --notify-ready=yes --kill-signal=SIGRTMIN+3 --bind-ro=/nix/store --bind-ro=/nix/var/nix/db --bind-ro=/nix/var/nix/daemon-socket --link-journal=try-guest
Setting RLIMIT_CPU to infinity.
Setting RLIMIT_FSIZE to infinity.
Setting RLIMIT_DATA to infinity.
Setting RLIMIT_STACK to 8388608:infinity.
Setting RLIMIT_CORE to 0:infinity.
Setting RLIMIT_RSS to infinity.
Setting RLIMIT_NPROC to infinity.
Setting RLIMIT_NOFILE to 1024:524288.
Setting RLIMIT_MEMLOCK to 8388608.
Setting RLIMIT_AS to infinity.
Setting RLIMIT_LOCKS to infinity.
Setting RLIMIT_SIGPENDING to 15440.
Setting RLIMIT_MSGQUEUE to 819200.
Setting RLIMIT_NICE to 0.
Setting RLIMIT_RTPRIO to 0.
Setting RLIMIT_RTTIME to infinity.
Resolved versioned directory pattern '/tmp' to file '/tmp' as version 'n/a'.
Found cgroup2 on /sys/fs/cgroup/, full unified hierarchy
░ Spawning container dokuwiki on /tmp.
░ Press Ctrl-] three times within 1s to kill container.
Mounting tmpfs (tmpfs) on /run/systemd/nspawn/unix-export/dokuwiki (MS_NOSUID|MS_NODEV|MS_NOEXEC|MS_NOSYMFOLLOW "size=4M,nr_inodes=64,mode=0755")...
Changing mount flags /run/systemd/nspawn/unix-export/dokuwiki (MS_RDONLY|MS_NOSUID|MS_NODEV|MS_NOEXEC|MS_REMOUNT|MS_NOSYMFOLLOW|MS_BIND "")...
Outer child is initializing.
Changing mount propagation / (MS_REC|MS_SLAVE "")
Bind-mounting /tmp on /tmp (MS_BIND|MS_REC "")...
Changing mount propagation /tmp (MS_REC|MS_PRIVATE "")
Using legacy hierarchy for container.
Mounting tmpfs (tmpfs) on /tmp/tmp (MS_NOSUID|MS_NODEV|MS_STRICTATIME "mode=01777,size=10%,nr_inodes=400k,uid=0,gid=0")...
Mounting tmpfs (tmpfs) on /tmp/sys (MS_NOSUID|MS_NODEV|MS_NOEXEC "mode=0555,size=4m,nr_inodes=1k,uid=0,gid=0")...
Mounting tmpfs (tmpfs) on /tmp/dev (MS_NOSUID|MS_STRICTATIME "mode=0755,size=4m,nr_inodes=64k,uid=0,gid=0")...
Mounting tmpfs (tmpfs) on /tmp/dev/shm (MS_NOSUID|MS_NODEV|MS_STRICTATIME "mode=01777,size=10%,nr_inodes=400k,uid=0,gid=0")...
Mounting tmpfs (tmpfs) on /tmp/run (MS_NOSUID|MS_NODEV|MS_STRICTATIME "mode=0755,size=20%,nr_inodes=800k,uid=0,gid=0")...
Bind-mounting /tmp/run/host on /tmp/run/host (MS_BIND "")...
Bind-mounting /etc/os-release on /tmp/run/host/os-release (MS_BIND "")...
Changing mount flags /tmp/run/host/os-release (MS_RDONLY|MS_NOSUID|MS_NODEV|MS_NOEXEC|MS_REMOUNT|MS_BIND "")...
Changing mount propagation /tmp/run/host/os-release (MS_PRIVATE "")
Bind-mounting /run/systemd/nspawn/unix-export/dokuwiki on /tmp/run/host/unix-export (MS_BIND "")...
Changing mount flags /tmp/run/host/unix-export (MS_NOSUID|MS_NODEV|MS_NOEXEC|MS_REMOUNT|MS_NOSYMFOLLOW|MS_BIND "")...
Mounting devpts (devpts) on /tmp/dev/pts (MS_NOSUID|MS_NOEXEC "newinstance,ptmxmode=0666,mode=620,gid=3")...
Bind-mounting /run/systemd/nspawn/propagate/dokuwiki on /tmp/run/host/incoming (MS_BIND "")...
Changing mount flags /tmp/run/host/incoming (MS_RDONLY|MS_REMOUNT|MS_BIND "")...
Bind-mounting /nix/store on /tmp/nix/store (MS_BIND|MS_REC "")...
Bind-mounting /nix/var/nix/daemon-socket on /tmp/nix/var/nix/daemon-socket (MS_BIND|MS_REC "")...
Bind-mounting /nix/var/nix/db on /tmp/nix/var/nix/db (MS_BIND|MS_REC "")...
Failed to remove '/tmp/etc/localtime', ignoring: No such file or directory
Changing mount propagation /run/host/incoming (MS_SLAVE "")
Inner child is initializing.
(sd-namespace) succeeded.
Init process invoked as PID 373
Mounting proc (proc) on /proc (MS_NOSUID|MS_NODEV|MS_NOEXEC "")...
Bind-mounting /proc/sys on /proc/sys (MS_BIND "")...
Bind-mounting /proc/sys/net on /proc/sys/net (MS_BIND "")...
Changing mount flags /proc/sys (MS_RDONLY|MS_NOSUID|MS_NODEV|MS_NOEXEC|MS_REMOUNT|MS_BIND "")...
Bind-mounting /run/systemd/inaccessible/reg on /proc/kallsyms (MS_BIND "")...
Failed to mount /run/systemd/inaccessible/reg (type n/a) on /proc/kallsyms (MS_BIND ""): No such file or directory
Changing mount flags /proc/kallsyms (MS_RDONLY|MS_NOSUID|MS_NODEV|MS_NOEXEC|MS_REMOUNT|MS_BIND "")...
Failed to mount n/a (type n/a) on /proc/kallsyms (MS_RDONLY|MS_NOSUID|MS_NODEV|MS_NOEXEC|MS_REMOUNT|MS_BIND ""): Invalid argument
Bind-mounting /run/systemd/inaccessible/reg on /proc/kcore (MS_BIND "")...
Failed to mount /run/systemd/inaccessible/reg (type n/a) on /proc/kcore (MS_BIND ""): No such file or directory
Changing mount flags /proc/kcore (MS_RDONLY|MS_NOSUID|MS_NODEV|MS_NOEXEC|MS_REMOUNT|MS_BIND "")...
Failed to mount n/a (type n/a) on /proc/kcore (MS_RDONLY|MS_NOSUID|MS_NODEV|MS_NOEXEC|MS_REMOUNT|MS_BIND ""): Invalid argument
Bind-mounting /run/systemd/inaccessible/reg on /proc/keys (MS_BIND "")...
Failed to mount /run/systemd/inaccessible/reg (type n/a) on /proc/keys (MS_BIND ""): No such file or directory
Changing mount flags /proc/keys (MS_RDONLY|MS_NOSUID|MS_NODEV|MS_NOEXEC|MS_REMOUNT|MS_BIND "")...
Failed to mount n/a (type n/a) on /proc/keys (MS_RDONLY|MS_NOSUID|MS_NODEV|MS_NOEXEC|MS_REMOUNT|MS_BIND ""): Invalid argument
Bind-mounting /run/systemd/inaccessible/reg on /proc/sysrq-trigger (MS_BIND "")...
Failed to mount /run/systemd/inaccessible/reg (type n/a) on /proc/sysrq-trigger (MS_BIND ""): No such file or directory
Changing mount flags /proc/sysrq-trigger (MS_RDONLY|MS_NOSUID|MS_NODEV|MS_NOEXEC|MS_REMOUNT|MS_BIND "")...
Failed to mount n/a (type n/a) on /proc/sysrq-trigger (MS_RDONLY|MS_NOSUID|MS_NODEV|MS_NOEXEC|MS_REMOUNT|MS_BIND ""): Invalid argument
Bind-mounting /run/systemd/inaccessible/reg on /proc/timer_list (MS_BIND "")...
Failed to mount /run/systemd/inaccessible/reg (type n/a) on /proc/timer_list (MS_BIND ""): No such file or directory
Changing mount flags /proc/timer_list (MS_RDONLY|MS_NOSUID|MS_NODEV|MS_NOEXEC|MS_REMOUNT|MS_BIND "")...
Failed to mount n/a (type n/a) on /proc/timer_list (MS_RDONLY|MS_NOSUID|MS_NODEV|MS_NOEXEC|MS_REMOUNT|MS_BIND ""): Invalid argument
Bind-mounting /proc/acpi on /proc/acpi (MS_BIND "")...
Changing mount flags /proc/acpi (MS_RDONLY|MS_NOSUID|MS_NODEV|MS_NOEXEC|MS_REMOUNT|MS_BIND "")...
Bind-mounting /proc/apm on /proc/apm (MS_BIND "")...
Failed to mount /proc/apm (type n/a) on /proc/apm (MS_BIND ""): No such file or directory
Changing mount flags /proc/apm (MS_RDONLY|MS_NOSUID|MS_NODEV|MS_NOEXEC|MS_REMOUNT|MS_BIND "")...
Failed to mount n/a (type n/a) on /proc/apm (MS_RDONLY|MS_NOSUID|MS_NODEV|MS_NOEXEC|MS_REMOUNT|MS_BIND ""): No such file or directory
Bind-mounting /proc/asound on /proc/asound (MS_BIND "")...
Changing mount flags /proc/asound (MS_RDONLY|MS_NOSUID|MS_NODEV|MS_NOEXEC|MS_REMOUNT|MS_BIND "")...
Bind-mounting /proc/bus on /proc/bus (MS_BIND "")...
Changing mount flags /proc/bus (MS_RDONLY|MS_NOSUID|MS_NODEV|MS_NOEXEC|MS_REMOUNT|MS_BIND "")...
Bind-mounting /proc/fs on /proc/fs (MS_BIND "")...
Changing mount flags /proc/fs (MS_RDONLY|MS_NOSUID|MS_NODEV|MS_NOEXEC|MS_REMOUNT|MS_BIND "")...
Bind-mounting /proc/irq on /proc/irq (MS_BIND "")...
Changing mount flags /proc/irq (MS_RDONLY|MS_NOSUID|MS_NODEV|MS_NOEXEC|MS_REMOUNT|MS_BIND "")...
Bind-mounting /proc/scsi on /proc/scsi (MS_BIND "")...
Changing mount flags /proc/scsi (MS_RDONLY|MS_NOSUID|MS_NODEV|MS_NOEXEC|MS_REMOUNT|MS_BIND "")...
Mounting mqueue (mqueue) on /dev/mqueue (MS_NOSUID|MS_NODEV|MS_NOEXEC "")...
Changing mount flags /run/host (MS_RDONLY|MS_NOSUID|MS_NODEV|MS_NOEXEC|MS_REMOUNT|MS_BIND "")...
Mounting sysfs (sysfs) on /sys/full (MS_RDONLY|MS_NOSUID|MS_NODEV|MS_NOEXEC "")...
Failed to mount sysfs (type sysfs) on /sys/full (MS_RDONLY|MS_NOSUID|MS_NODEV|MS_NOEXEC ""): No such file or directory
Bus n/a: changing state UNSET → OPENING
sd-bus: starting bus by connecting to /run/dbus/system_bus_socket...
Bus n/a: changing state OPENING → AUTHENTICATING
Bus n/a: changing state AUTHENTICATING → HELLO
Sent message type=method_call sender=n/a destination=org.freedesktop.DBus path=/org/freedesktop/DBus interface=org.freedesktop.DBus member=Hello cookie=1 reply_cookie=0 signature=n/a error-name=n/a error-message=n/a
Got message type=method_return sender=org.freedesktop.DBus destination=:1.12 path=n/a interface=n/a member=n/a cookie=1 reply_cookie=1 signature=s error-name=n/a error-message=n/a
Bus n/a: changing state HELLO → RUNNING
Sent message type=method_call sender=n/a destination=org.freedesktop.machine1 path=/org/freedesktop/machine1 interface=org.freedesktop.machine1.Manager member=RegisterMachineWithNetwork cookie=2 reply_cookie=0 signature=sayssusai error-name=n/a error-message=n/a
Got message type=error sender=:1.7 destination=:1.12 path=n/a interface=n/a member=n/a cookie=19 reply_cookie=2 signature=s error-name=System.Error.ENXIO error-message=Failed to determine unit of process 373 : No such device or address
Failed to register machine: Failed to determine unit of process 373 : No such device or address
Bus n/a: changing state RUNNING → CLOSED
Incus issue: https://github.com/lxc/incus/issues/1321
hmm, why did you file this as bug against systemd? This seems to be an Apparmor policy issue? what does thta have to do with us upstream? we do not maintain aa policies, that's a downstream thing
I think it's a bug to assume the mkdir will not fail and it should be better to catch that error here: https://github.com/systemd/systemd/blob/main/src/nspawn/nspawn-mount.c#L474-L478
Additionally as I said maybe another path would make more sense (say a folder in /tmp/ like /tmp/sys_full) but I'm not sure about that being a good idea.
Besides, isn't the very point of nspawn not to write outside its tree? It should probably not read /sys at all.