rshared submount is not clearly umount in host
Description
After a bidirectional mount inside the container, all the sub mount under this mount will be propagated to host, but these mounts won't be umounted after remove the container
Steps to reproduce the issue
- I create a pod and a container using crictl, sandbox.json is
{
"metadata": {
"name": "looper-sandbox",
"namespace": "default",
"attempt": 1,
"uid": "c6318cce-89b2-4f02-a702-7ba243a2fbd1"
},
"log_directory": "/home/guo.gh/",
"linux": {
"security_context": {
"privileged": true,
"namespace_options": {
"network": 2,
"pid":2,
"ipc":2
}
}
}
}
- container.json like this, it will bind mount hostpath /test to container's /root/test, and this is a bidirectional(rshared) bind mount, which meas submounts will be propagated to the host
{
"metadata": {
"name": "looper"
},
"log_path": "loop.log",
"image": {
"image": "busybox:latest"
},
"command": [
"/bin/sh",
"-c",
"i=0; while true; do t=$(date); echo $t -- $i; i=$(expr $i + 1); sleep 1; done"
],
"mounts":[
{
"container_path": "/test",
"host_path": "/root/test",
"readonly": false,
"propagation": 2
}
],
"linux": {
"security_context": {
"privileged": true,
"namespace_options": {
"network": 2,
"pid":1,
"ipc":1
}
}
}
}
- start sandbox and container
pod=`sudo crictl runp sandbox.json`
cnt=`sudo crictl create $pod container.json sandbox.json`
crictl start $cnt
- exec into the container and bind mount /root/test/a to /root/test/b
crictl exec -it $cnt /bin/sh
mkdir /root/test/a
mkdir /root/test/b
mount --bind /root/test/a /root/test/b
- then i can see a mount in host
mount | grep test
- remove the container and sandbox, the mount propagated to host still exists without unmounted, and this mount items will increase as I start and remove container, leaking mount and mount mount in host. my runc version is 1.0.8 and I think runc should do something to make sure all mounts propagated from inside the container should be umounted.
Describe the results you received and expected
mounts should be clearly umounted on host
What version of runc are you using?
runc version 1.1.8 commit: v1.1.8-10-g85d13e5c spec: 1.0.2-dev go: go1.18.5 libseccomp: 2.5.2
Host OS information
NAME="Alibaba Cloud Linux" VERSION="3 (Soaring Falcon)" ID="alinux" ID_LIKE="rhel fedora centos anolis" VERSION_ID="3" UPDATE_ID="10" PLATFORM_ID="platform:al8" PRETTY_NAME="Alibaba Cloud Linux 3 (Soaring Falcon)" ANSI_COLOR="0;31" HOME_URL="https://www.aliyun.com/"
Host kernel information
Linux k39c03413.sqa.eu95 5.10.134-007.ali5000.al8.x86_64 #1 SMP Fri Mar 3 18:41:24 CST 2023 x86_64 x86_64 x86_64 GNU/Linux
I found a similar issue in docker , docker issue, which thinks runtime should not umount this propagation submount, and container process should do it by itself
This is one of many pitfalls of shared mount propagation, and as such I would strongly suggest not using it unless it's really necessary.
Unlike regular unmounts, the mount namespace being destroyed (when the container dies) does not trigger unmounts to propagate to the host. In addition, runc has no way of knowing what mounts have been propagated (much less whether the user would actually want us to unmount anything). Also, in most usecases (within Docker/containerd/Kubernetes), runc goes away after the container has been configured so there is no runc program that could even do the unmount if we wanted to.