systemd icon indicating copy to clipboard operation
systemd copied to clipboard

nspawn, apparmor, incus: /sys/full broken due to apparmor rule

Open mkg20001 opened this issue 1 year ago • 1 comments

systemd version the issue has been seen with

xg6f0c5pchmc2jq84s4np19j1rnn90mn-systemd-256.6

Used distribution

Nixos 24.11pre693653.5785b6bb5eaa (Vicuna)

Linux kernel version used

6.8.0-47-generic

CPU architectures issue was seen on

x86_64

Component

systemd-nspawn

Expected behaviour you didn't see

Starting an nspawn container in a incus container with security.privileged=true and security.nesting=true just works as expected

Unexpected behaviour you saw

When starting an nspawn container nspawn tries to mkdir /sys/full

This results in permission denied due to the apparmor rule:

Oct 19 13:25:16 r3-website container dokuwiki[711]: [pid   723] mkdir("/sys/full", 0755)    = -1 EACCES (Permission denied)
Oct 19 13:25:16 r3-website container dokuwiki[711]: [pid   720] <... getsockopt resumed>[0], [4]) = 0
Oct 19 13:25:16 r3-website container dokuwiki[711]: [pid   720] setsockopt(15, SOL_NETLINK, NETLINK_EXT_ACK, [1], 4 <unfinished ...>
Oct 19 13:25:16 r3-website container dokuwiki[711]: [pid   723] openat(AT_FDCWD, "/sys/full", O_RDONLY|O_NOFOLLOW|O_CLOEXEC|O_PATH <unfinished ...>

This isn't caught at all as nspawn-mount.c just assumes the mkdir will work: https://github.com/systemd/systemd/blob/main/src/nspawn/nspawn-mount.c#L474-L478

This leads to a mysterious error: Failed to mount sysfs (type sysfs) on /sys/full (MS_RDONLY|MS_NOSUID|MS_NODEV|MS_NOEXEC ""): No such file or directory

(this could be caught at the mkdir step)

Additionally /sys/full may not be the right choice of directory for re-mounting sys (?) which would avoid this issue alltogether due to not matching the apparmor rule

Steps to reproduce the problem

For full reproduction you need ubuntu with incus (and incus must have apparmor enabled) And run the following incus commands:

  • incus launch images:nixos/unstable repronspawn -c security.privileged=true -c security.nesting=true
  • incus exec repronspawn bash
  • systemd-nspawn --keep-unit -M dokuwiki -D /tmp --private-network --network-veth --notify-ready=yes --kill-signal=SIGRTMIN+3 --bind-ro=/nix/store --bind-ro=/nix/var/nix/db --bind-ro=/nix/var/nix/daemon-socket --link-journal=try-guest

This should trigger, among others, the Failed to mount sysfs (type sysfs) on /sys/full (MS_RDONLY|MS_NOSUID|MS_NODEV|MS_NOEXEC ""): No such file or directory error

Additional program output to the terminal or log subsystem illustrating the issue

SYSTEMD_LOG_LEVEL=debug  systemd-nspawn --keep-unit -M dokuwiki -D /tmp --private-network --network-veth --notify-ready=yes --kill-signal=SIGRTMIN+3 --bind-ro=/nix/store --bind-ro=/nix/var/nix/db --bind-ro=/nix/var/nix/daemon-socket --link-journal=try-guest

Setting RLIMIT_CPU to infinity.
Setting RLIMIT_FSIZE to infinity.
Setting RLIMIT_DATA to infinity.
Setting RLIMIT_STACK to 8388608:infinity.
Setting RLIMIT_CORE to 0:infinity.
Setting RLIMIT_RSS to infinity.
Setting RLIMIT_NPROC to infinity.
Setting RLIMIT_NOFILE to 1024:524288.
Setting RLIMIT_MEMLOCK to 8388608.
Setting RLIMIT_AS to infinity.
Setting RLIMIT_LOCKS to infinity.
Setting RLIMIT_SIGPENDING to 15440.
Setting RLIMIT_MSGQUEUE to 819200.
Setting RLIMIT_NICE to 0.
Setting RLIMIT_RTPRIO to 0.
Setting RLIMIT_RTTIME to infinity.
Resolved versioned directory pattern '/tmp' to file '/tmp' as version 'n/a'.
Found cgroup2 on /sys/fs/cgroup/, full unified hierarchy
░ Spawning container dokuwiki on /tmp.
░ Press Ctrl-] three times within 1s to kill container.
Mounting tmpfs (tmpfs) on /run/systemd/nspawn/unix-export/dokuwiki (MS_NOSUID|MS_NODEV|MS_NOEXEC|MS_NOSYMFOLLOW "size=4M,nr_inodes=64,mode=0755")...
Changing mount flags /run/systemd/nspawn/unix-export/dokuwiki (MS_RDONLY|MS_NOSUID|MS_NODEV|MS_NOEXEC|MS_REMOUNT|MS_NOSYMFOLLOW|MS_BIND "")...
Outer child is initializing.
Changing mount propagation / (MS_REC|MS_SLAVE "")
Bind-mounting /tmp on /tmp (MS_BIND|MS_REC "")...
Changing mount propagation /tmp (MS_REC|MS_PRIVATE "")
Using legacy hierarchy for container.
Mounting tmpfs (tmpfs) on /tmp/tmp (MS_NOSUID|MS_NODEV|MS_STRICTATIME "mode=01777,size=10%,nr_inodes=400k,uid=0,gid=0")...
Mounting tmpfs (tmpfs) on /tmp/sys (MS_NOSUID|MS_NODEV|MS_NOEXEC "mode=0555,size=4m,nr_inodes=1k,uid=0,gid=0")...
Mounting tmpfs (tmpfs) on /tmp/dev (MS_NOSUID|MS_STRICTATIME "mode=0755,size=4m,nr_inodes=64k,uid=0,gid=0")...
Mounting tmpfs (tmpfs) on /tmp/dev/shm (MS_NOSUID|MS_NODEV|MS_STRICTATIME "mode=01777,size=10%,nr_inodes=400k,uid=0,gid=0")...
Mounting tmpfs (tmpfs) on /tmp/run (MS_NOSUID|MS_NODEV|MS_STRICTATIME "mode=0755,size=20%,nr_inodes=800k,uid=0,gid=0")...
Bind-mounting /tmp/run/host on /tmp/run/host (MS_BIND "")...
Bind-mounting /etc/os-release on /tmp/run/host/os-release (MS_BIND "")...
Changing mount flags /tmp/run/host/os-release (MS_RDONLY|MS_NOSUID|MS_NODEV|MS_NOEXEC|MS_REMOUNT|MS_BIND "")...
Changing mount propagation /tmp/run/host/os-release (MS_PRIVATE "")
Bind-mounting /run/systemd/nspawn/unix-export/dokuwiki on /tmp/run/host/unix-export (MS_BIND "")...
Changing mount flags /tmp/run/host/unix-export (MS_NOSUID|MS_NODEV|MS_NOEXEC|MS_REMOUNT|MS_NOSYMFOLLOW|MS_BIND "")...
Mounting devpts (devpts) on /tmp/dev/pts (MS_NOSUID|MS_NOEXEC "newinstance,ptmxmode=0666,mode=620,gid=3")...
Bind-mounting /run/systemd/nspawn/propagate/dokuwiki on /tmp/run/host/incoming (MS_BIND "")...
Changing mount flags /tmp/run/host/incoming (MS_RDONLY|MS_REMOUNT|MS_BIND "")...
Bind-mounting /nix/store on /tmp/nix/store (MS_BIND|MS_REC "")...
Bind-mounting /nix/var/nix/daemon-socket on /tmp/nix/var/nix/daemon-socket (MS_BIND|MS_REC "")...
Bind-mounting /nix/var/nix/db on /tmp/nix/var/nix/db (MS_BIND|MS_REC "")...
Failed to remove '/tmp/etc/localtime', ignoring: No such file or directory
Changing mount propagation /run/host/incoming (MS_SLAVE "")
Inner child is initializing.
(sd-namespace) succeeded.
Init process invoked as PID 373
Mounting proc (proc) on /proc (MS_NOSUID|MS_NODEV|MS_NOEXEC "")...
Bind-mounting /proc/sys on /proc/sys (MS_BIND "")...
Bind-mounting /proc/sys/net on /proc/sys/net (MS_BIND "")...
Changing mount flags /proc/sys (MS_RDONLY|MS_NOSUID|MS_NODEV|MS_NOEXEC|MS_REMOUNT|MS_BIND "")...
Bind-mounting /run/systemd/inaccessible/reg on /proc/kallsyms (MS_BIND "")...
Failed to mount /run/systemd/inaccessible/reg (type n/a) on /proc/kallsyms (MS_BIND ""): No such file or directory
Changing mount flags /proc/kallsyms (MS_RDONLY|MS_NOSUID|MS_NODEV|MS_NOEXEC|MS_REMOUNT|MS_BIND "")...
Failed to mount n/a (type n/a) on /proc/kallsyms (MS_RDONLY|MS_NOSUID|MS_NODEV|MS_NOEXEC|MS_REMOUNT|MS_BIND ""): Invalid argument
Bind-mounting /run/systemd/inaccessible/reg on /proc/kcore (MS_BIND "")...
Failed to mount /run/systemd/inaccessible/reg (type n/a) on /proc/kcore (MS_BIND ""): No such file or directory
Changing mount flags /proc/kcore (MS_RDONLY|MS_NOSUID|MS_NODEV|MS_NOEXEC|MS_REMOUNT|MS_BIND "")...
Failed to mount n/a (type n/a) on /proc/kcore (MS_RDONLY|MS_NOSUID|MS_NODEV|MS_NOEXEC|MS_REMOUNT|MS_BIND ""): Invalid argument
Bind-mounting /run/systemd/inaccessible/reg on /proc/keys (MS_BIND "")...
Failed to mount /run/systemd/inaccessible/reg (type n/a) on /proc/keys (MS_BIND ""): No such file or directory
Changing mount flags /proc/keys (MS_RDONLY|MS_NOSUID|MS_NODEV|MS_NOEXEC|MS_REMOUNT|MS_BIND "")...
Failed to mount n/a (type n/a) on /proc/keys (MS_RDONLY|MS_NOSUID|MS_NODEV|MS_NOEXEC|MS_REMOUNT|MS_BIND ""): Invalid argument
Bind-mounting /run/systemd/inaccessible/reg on /proc/sysrq-trigger (MS_BIND "")...
Failed to mount /run/systemd/inaccessible/reg (type n/a) on /proc/sysrq-trigger (MS_BIND ""): No such file or directory
Changing mount flags /proc/sysrq-trigger (MS_RDONLY|MS_NOSUID|MS_NODEV|MS_NOEXEC|MS_REMOUNT|MS_BIND "")...
Failed to mount n/a (type n/a) on /proc/sysrq-trigger (MS_RDONLY|MS_NOSUID|MS_NODEV|MS_NOEXEC|MS_REMOUNT|MS_BIND ""): Invalid argument
Bind-mounting /run/systemd/inaccessible/reg on /proc/timer_list (MS_BIND "")...
Failed to mount /run/systemd/inaccessible/reg (type n/a) on /proc/timer_list (MS_BIND ""): No such file or directory
Changing mount flags /proc/timer_list (MS_RDONLY|MS_NOSUID|MS_NODEV|MS_NOEXEC|MS_REMOUNT|MS_BIND "")...
Failed to mount n/a (type n/a) on /proc/timer_list (MS_RDONLY|MS_NOSUID|MS_NODEV|MS_NOEXEC|MS_REMOUNT|MS_BIND ""): Invalid argument
Bind-mounting /proc/acpi on /proc/acpi (MS_BIND "")...
Changing mount flags /proc/acpi (MS_RDONLY|MS_NOSUID|MS_NODEV|MS_NOEXEC|MS_REMOUNT|MS_BIND "")...
Bind-mounting /proc/apm on /proc/apm (MS_BIND "")...
Failed to mount /proc/apm (type n/a) on /proc/apm (MS_BIND ""): No such file or directory
Changing mount flags /proc/apm (MS_RDONLY|MS_NOSUID|MS_NODEV|MS_NOEXEC|MS_REMOUNT|MS_BIND "")...
Failed to mount n/a (type n/a) on /proc/apm (MS_RDONLY|MS_NOSUID|MS_NODEV|MS_NOEXEC|MS_REMOUNT|MS_BIND ""): No such file or directory
Bind-mounting /proc/asound on /proc/asound (MS_BIND "")...
Changing mount flags /proc/asound (MS_RDONLY|MS_NOSUID|MS_NODEV|MS_NOEXEC|MS_REMOUNT|MS_BIND "")...
Bind-mounting /proc/bus on /proc/bus (MS_BIND "")...
Changing mount flags /proc/bus (MS_RDONLY|MS_NOSUID|MS_NODEV|MS_NOEXEC|MS_REMOUNT|MS_BIND "")...
Bind-mounting /proc/fs on /proc/fs (MS_BIND "")...
Changing mount flags /proc/fs (MS_RDONLY|MS_NOSUID|MS_NODEV|MS_NOEXEC|MS_REMOUNT|MS_BIND "")...
Bind-mounting /proc/irq on /proc/irq (MS_BIND "")...
Changing mount flags /proc/irq (MS_RDONLY|MS_NOSUID|MS_NODEV|MS_NOEXEC|MS_REMOUNT|MS_BIND "")...
Bind-mounting /proc/scsi on /proc/scsi (MS_BIND "")...
Changing mount flags /proc/scsi (MS_RDONLY|MS_NOSUID|MS_NODEV|MS_NOEXEC|MS_REMOUNT|MS_BIND "")...
Mounting mqueue (mqueue) on /dev/mqueue (MS_NOSUID|MS_NODEV|MS_NOEXEC "")...
Changing mount flags /run/host (MS_RDONLY|MS_NOSUID|MS_NODEV|MS_NOEXEC|MS_REMOUNT|MS_BIND "")...
Mounting sysfs (sysfs) on /sys/full (MS_RDONLY|MS_NOSUID|MS_NODEV|MS_NOEXEC "")...
Failed to mount sysfs (type sysfs) on /sys/full (MS_RDONLY|MS_NOSUID|MS_NODEV|MS_NOEXEC ""): No such file or directory
Bus n/a: changing state UNSET → OPENING
sd-bus: starting bus by connecting to /run/dbus/system_bus_socket...
Bus n/a: changing state OPENING → AUTHENTICATING
Bus n/a: changing state AUTHENTICATING → HELLO
Sent message type=method_call sender=n/a destination=org.freedesktop.DBus path=/org/freedesktop/DBus interface=org.freedesktop.DBus member=Hello cookie=1 reply_cookie=0 signature=n/a error-name=n/a error-message=n/a
Got message type=method_return sender=org.freedesktop.DBus destination=:1.12 path=n/a interface=n/a member=n/a  cookie=1 reply_cookie=1 signature=s error-name=n/a error-message=n/a
Bus n/a: changing state HELLO → RUNNING
Sent message type=method_call sender=n/a destination=org.freedesktop.machine1 path=/org/freedesktop/machine1 interface=org.freedesktop.machine1.Manager member=RegisterMachineWithNetwork cookie=2 reply_cookie=0 signature=sayssusai error-name=n/a error-message=n/a
Got message type=error sender=:1.7 destination=:1.12 path=n/a interface=n/a member=n/a  cookie=19 reply_cookie=2 signature=s error-name=System.Error.ENXIO error-message=Failed to determine unit of process 373 : No such device or address
Failed to register machine: Failed to determine unit of process 373 : No such device or address
Bus n/a: changing state RUNNING → CLOSED

mkg20001 avatar Oct 19 '24 13:10 mkg20001

Incus issue: https://github.com/lxc/incus/issues/1321

mkg20001 avatar Oct 19 '24 13:10 mkg20001

hmm, why did you file this as bug against systemd? This seems to be an Apparmor policy issue? what does thta have to do with us upstream? we do not maintain aa policies, that's a downstream thing

poettering avatar Oct 21 '24 07:10 poettering

I think it's a bug to assume the mkdir will not fail and it should be better to catch that error here: https://github.com/systemd/systemd/blob/main/src/nspawn/nspawn-mount.c#L474-L478

Additionally as I said maybe another path would make more sense (say a folder in /tmp/ like /tmp/sys_full) but I'm not sure about that being a good idea.

mkg20001 avatar Oct 22 '24 13:10 mkg20001

Besides, isn't the very point of nspawn not to write outside its tree? It should probably not read /sys at all.

sipaktli avatar Oct 21 '25 20:10 sipaktli