All read-only mounts set through the CSI plugin have failed.
Description
We uniformly configure read-only mounts for program areas of a specific type of volume through the CSI component, so ordinary services don't need to be aware that they are automatically mounted in read-only mode. However, due to the following changes, the read-only restriction has failed, and the mounted directories within the containers have become writable, posing a security risk. https://github.com/opencontainers/runc/releases/tag/v1.2.0-rc.1
Steps to reproduce the issue
Using an extended CSI component, the host directory /opt/data is mounted to the pod directory /var/lib/kubelet/pods/xxx/volumes/kubernetes.io~csi/pkg/mount. The mount parameters include the configuration of ro (read-only). In a regular pod configuration, the extended CSI is used to mount the directory to the /opt/test directory inside the container. This step is performed by runc. After entering the container, it is possible to create new files in the /opt/test directory.
Describe the results you received and expected
I expect that when entering the /opt/test directory in the container, it should be in read-only mode, preventing the creation or modification of files.
What version of runc are you using?
runc version 1.2.4 commit: v1.2.4-0-g6c52b3f spec: 1.2.0 go: go1.24.1 libseccomp: 2.5.0
Host OS information
NAME="EulerOS" VERSION="2.0 (SP13x86_64)" ID="euleros" VERSION_ID="2.0" PRETTY_NAME="EulerOS 2.0 (SP13x86_64)" ANSI_COLOR="0;31"
Host kernel information
Linux master1 5.10.0
Can you please provide some kind of configuration or any other useful information so that we can actually understand what role runc is playing / what configuration is being used?
We did change how mount flags work in runc 1.2, but if you are explicitly setting ro for mounts in config.json we will absolutely honour that. The only thing that has changed in scenarios like you describe is that a user specifying rw will no longer be silently ignored if the original mount was ro (and some other conditions are met) -- instead you will get an rw mount because that is what was asked for. If you are explicitly setting ro the way you describe, this should work with runc 1.2.
Below is the mount configuration for the business: volumeMounts: mountPath: /opt/pkgs name: mypkg volumes csi: driver: sop-csi-driver volumeAttributes: type: pkg name: mypkg
The sop-csi-driver CSI plugin is responsible for mounting the CSI volume of type pkg from the /opt/pkg directory on the host to the /var/lib/kubelet/xxx/csi/pkg/mount directory in the container. This layer of mounting is configured with the ro (read-only) method. Accessing the /var/lib/kubelet/xxx/csi/pkg/mount directory from the host remains read-only. However, it becomes writable once inside the container.
Entering the corresponding directory in the container allows for file creation.
@cyphar
I don't know if Kubernetes auto-sets rw in the config they give us (in which case, that's a Kubernetes bug -- they should just omit rw or ro if the user did not request either) but surely there is a mount option to request it be read-only.
Another solution is to use the superblock read-only flag rather than the VFS one -- this is set with mount -o remount,ro which cannot be cleared with VFS mount flags. Though this will also make the mount read-only to all users (possibly even to new mounts of the same filesystem, since the kernel caches superblocks).
I don't know if Kubernetes auto-sets
rwin the config they give us (in which case, that's a Kubernetes bug -- they should just omitrworroif the user did not request either) but surely there is a mount option to request it be read-only.Another solution is to use the superblock read-only flag rather than the VFS one -- this is set with
mount -o remount,rowhich cannot be cleared with VFS mount flags. Though this will also make the mount read-only to all users (possibly even to new mounts of the same filesystem, since the kernel caches superblocks).
In runc version 1.1.12, mounts inherited the original directory's mount mode (read-only), but this behavior was changed in version 1.2.4. I'd like to understand the main pain points this modification aimed to address.
In runc version 1.1.12, mounts inherited the original directory's mount mode (read-only), but this behavior was changed in version 1.2.4. I'd like to understand the main pain points this modification aimed to address.
This is not an accurate description of runc's behaviour (neither before nor after runc 1.2). You can test it for yourself, outside of Kubernetes.
% runc version
runc version 1.3.0
commit: v1.3.0-0-g4ca628d1d4c9
spec: 1.2.1
go: go1.24.5
libseccomp: 2.6.0
% mkdir /tmp/dir
% mount --bind -o ro /tmp/dir /tmp/dir # vfs flag "ro" set
% cat bundle/config.json
{
/* ... */
"mounts": [
/* ... */
{
"source": "/tmp/dir",
"destination": "/dir-ro",
"type": "bind",
"options": [
"bind", "ro"
]
},
{
"source": "/tmp/dir",
"destination": "/dir-rw",
"type": "bind",
"options": [
"bind", "rw"
]
},
{
"source": "/tmp/dir",
"destination": "/dir-dfl",
"type": "bind",
"options": [
"bind"
]
}
}
/* ... */
}
% runc run -b bundle ctr
# mount | grep dir-
tmpfs on /dir-ro type tmpfs (ro,size=31788528k,nr_inodes=1048576,inode64)
tmpfs on /dir-rw type tmpfs (rw,size=31788528k,nr_inodes=1048576,inode64)
tmpfs on /dir-dfl type tmpfs (ro,nosuid,nodev,size=31788528k,nr_inodes=1048576,inode64)
#
/dir-dfl is what runc produces if you do not specify rw nor ro (it copies the flags of the host mount). If Kubernetes is auto-inserting rw to all mounts by default (something other tools like podman have done before), then you will see a behaviour change that looks similar to what your describing (previously we would silently ignore rw mount options which was the incorrect behaviour), but that would be a bug in Kubernetes, not runc.
The problems that were fixed (and the precise details of the behaviour changes) are described in the changelog:
* Several aspects of how mount options work has been adjusted in a way that
could theoretically break users that have very strange mount option strings.
This was necessary to fix glaring issues in how mount options were being
treated. The key changes are:
- Mount options on bind-mounts that clear a mount flag are now always
applied. Previously, if a user requested a bind-mount with only clearing
options (such as `rw,exec,dev`) the options would be ignored and the
original bind-mount options would be set. Unfortunately this also means
that container configurations which specified only clearing mount options
will now actually get what they asked for, which could break existing
containers (though it seems unlikely that a user who requested a specific
mount option would consider it "broken" to get the mount options they
asked for). This also allows us to silently add locked mount flags the
user *did not explicitly request to be cleared* in rootless mode,
allowing for easier use of bind-mounts for rootless containers. (#3967)
- Container configurations using bind-mounts with superblock mount flags
(i.e. filesystem-specific mount flags, referred to as "data" in
`mount(2)`, as opposed to VFS generic mount flags like `MS_NODEV`) will
now return an error. This is because superblock mount flags will also
affect the host mount (as the superblock is shared when bind-mounting),
which is obviously not acceptable. Previously, these flags were silently
ignored so this change simply tells users that runc cannot fulfil their
request rather than just ignoring it. (#3990)
EDIT: There was also a separate bug fixed in #3967 that was related to us silently clearing ro flags when doing a mount, so runc 1.1.x actually had a bug that matches what you are describing -- which is why I said that your description of the behaviour is wrong for pre-1.2 runc (because we had bugs that could clear flags like ro silently) and post-1.2 (we do not silently clear flags anymore, nor do we auto-set rw in runc).
Try to reproduce your issue with runc directly. If you cannot reproduce your issue outside of Kubernetes, please submit a bug to Kubernetes instead.