runc
runc copied to clipboard
libcontainer: fix a bug when setting shared rootfs propagation mode
So far when the input mount flags contain MS_SHARED, the flag has not been applied to the container rootfs. That's because we call rootfsParentMountPrivate() after applying the original mount flags. Though we actually need to mount the container rootfs with MS_PRIVATE, to avoid failure from pivot_root() in the Linux kernel.
Thus if the mount flags contain MS_SHARED, we need a special case handling. First do pivotRoot() (or msMoveRoot, chroot) with the rootfs with a mount flag MS_PRIVATE. Then after pivotRoot(), again mount the rootfs with MS_SHARED.
With this fix, validation/linux_rootfs_propagation.t of runtime-tools works well with the shared mode finally.
Fixes https://github.com/opencontainers/runc/issues/1755
/cc @alban
Signed-off-by: Dongsu Park [email protected]
@rhvgoyal PTAL
Can we have integration test (in this repo)?
@dongsupark Are you still interested in this PR?
@lifubang Thanks for the ping. But I have completely lost the context. First I have to try to remember what I did. ;-)
Please see https://github.com/opencontainers/runc/discussions/3948
The root directory mount propagation was set to shared when this commit applied, like crun.
[root@fedora38 runc]# podman info --format {{.Host.OCIRuntime.Version}}
runc version 1.1.0+dev
commit: v1.1.0-647-ga5777e87-dirty
spec: 1.1.0-rc.3
go: go1.20.5
libseccomp: 2.5.3
[root@fedora38 runc]# mkdir ~/hoge && mount -t tmpfs tmpfs ~/hoge
[root@fedora38 runc]# mount | grep "/hoge "
tmpfs on /root/hoge type tmpfs (rw,relatime,seclabel,inode64)
[root@fedora38 runc]# podman run -itd --privileged --name testcon --volume ~/hoge:/hoge:z,shared fedora-minimal:34 /bin/bash
1cb7167f2d47f18440c9f1b792a49bbc5c2f52def79dc9c746c9536a6e24f229
[root@fedora38 runc]# cat /var/lib/containers/storage/overlay-containers/1cb7167f2d47f18440c9f1b792a49bbc5c2f52def79dc9c746c9536a6e24f229/userdata/config.json | jq .linux.rootfsPropagation
"shared"
[root@fedora38 runc]# podman exec testcon mkdir /test
[root@fedora38 runc]# podman exec testcon mount -t tmpfs tmpfs /test
[root@fedora38 runc]# podman exec testcon findmnt -o "TARGET,PROPAGATION"
TARGET PROPAGATION
/ shared
|-/test shared
|-/sys private
| `-/sys/fs/cgroup private
|-/proc private
|-/dev private
| |-/dev/console private
| |-/dev/mqueue private
| |-/dev/shm private
| `-/dev/pts private
|-/hoge shared
|-/run/.containerenv private
|-/etc/hosts private
|-/run/secrets private
|-/etc/hostname private
`-/etc/resolv.conf private
In other cases, however, the mount propagation was set differently from crun.
For example, the following Steps to reproduce shows that the mount propagation for
the root directory was shared for crun, but shared,slave for runc.
Since the .linux.rootfsPropagation field in config.json is shared,
it seems correct to be shared like crun.
Steps to reproduce:
- runc
[test@fedora38 ~]$ podman info --format {{.Host.OCIRuntime.Version}}
runc version 1.1.0+dev
commit: v1.1.0-647-ga5777e87-dirty
spec: 1.1.0-rc.3
go: go1.20.5
libseccomp: 2.5.3
[test@fedora38 ~]$ mount | grep /tmp
tmpfs on /tmp type tmpfs (rw,nosuid,nodev,seclabel,nr_inodes=1048576,inode64)
[test@fedora38 ~]$ mkdir ~/{vol1,vol2}
[test@fedora38 ~]$ podman --storage-driver vfs --root /tmp/hoge run -itd --name testcon --volume ~/vol1:/myvol1:z,shared --volume ~/vol2:/myvol2:z fedora-minimal:34 /bin/bash
... snip ...
f4c1e806d6d475aec845ee0b5e29df5e36badeec00749b8d96be931ce43699a6
[test@fedora38 ~]$ cat /tmp/hoge/vfs-containers/f4c1e806d6d475aec845ee0b5e29df5e36badeec00749b8d96be931ce43699a6/userdata/config.json | jq .linux.rootfsPropagation
"shared"
[test@fedora38 ~]$ podman --root /tmp/hoge --storage-driver vfs exec testcon findmnt -o "TARGET,PROPAGATION" | grep -e "\/ " -e "\/myvol1 " -e "\/myvol2 "
/ shared,slave
|-/myvol1 shared,slave
|-/myvol2 private
- crun
[test@fedora38 ~]$ podman info --format {{.Host.OCIRuntime.Version}}
crun version 1.8.5
commit: b6f80f766c9a89eb7b1440c0a70ab287434b17ed
rundir: /run/user/1000/crun
spec: 1.0.0
+SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +LIBKRUN +WASM:wasmedge +YAJL
[test@fedora38 ~]$ mount | grep /tmp
tmpfs on /tmp type tmpfs (rw,nosuid,nodev,seclabel,nr_inodes=1048576,inode64)
[test@fedora38 ~]$ mkdir ~/{vol1,vol2}
[test@fedora38 ~]$ podman --storage-driver vfs --root /tmp/hoge run -itd --name testcon --volume ~/vol1:/myvol1:z,shared --volume ~/vol2:/myvol2:z fedora-minimal:34 /bin/bash
... snip ...
1d73a122ae6e4f7fe455b88677e91b7e20fd421ff141a0cb620b08f3d5cd2f48
[test@fedora38 ~]$ cat /tmp/hoge/vfs-containers/1d73a122ae6e4f7fe455b88677e91b7e20fd421ff141a0cb620b08f3d5cd2f48/userdata/config.json | jq .linux.rootfsPropagation
"shared"
[test@fedora38 ~]$ podman --root /tmp/hoge --storage-driver vfs exec testcon findmnt -o "TARGET,PROPAGATION" | grep -e "\/ " -e "\/myvol1 " -e "\/myvol2 "
/ shared
|-/myvol1 shared,slave
|-/myvol2 private
Rebased the PR, addressed review comments, and added an integration test.
In other cases, however, the mount propagation was set differently from crun. For example, the following Steps to reproduce shows that the mount propagation for the root directory was
sharedfor crun, butshared,slavefor runc.
Yes, I am seeing that as well. I could not figure it out.
That issue however seems to happen already in main as well as v1.1.8 of runc.
I confirmed that the mount propagation also contains slave in runc main branch.
Therefore, at least, it doesn't seem to be caused by rootfsParentMountShared().
[test@fedora38 ~]$ podman info --format {{.Host.OCIRuntime.Version}}
runc version 1.1.0+dev
commit: v1.1.0-656-g14c7ab7f
spec: 1.1.0
go: go1.20.5
libseccomp: 2.5.3
[test@fedora38 ~]$ mount | grep /tmp
tmpfs on /tmp type tmpfs (rw,nosuid,nodev,seclabel,nr_inodes=1048576,inode64)
[test@fedora38 ~]$ mkdir ~/{vol1,vol2}
[test@fedora38 ~]$ podman --storage-driver vfs --root /tmp/hoge run -itd --name testcon --volume ~/vol1:/myvol1:z,shared --volume ~/vol2:/myvol2:z fedora-minimal:34 /bin/bash
... snip ...
15fdd0d1fce19b8755254cf913eacb8d7355fe6a9739db219c063964a47a711d
[test@fedora38 ~]$ cat /tmp/hoge/vfs-containers/15fdd0d1fce19b8755254cf913eacb8d7355fe6a9739db219c063964a47a711d/userdata/config.json | jq .linux.rootfsPropagation
"shared"
[test@fedora38 ~]$ podman --root /tmp/hoge --storage-driver vfs exec testcon findmnt -o "TARGET,PROPAGATION" | grep -e "\/ " -e "\/myvol1 " -e "\/myvol2 "
/ private,slave
|-/myvol1 shared,slave
|-/myvol2 private