runc icon indicating copy to clipboard operation
runc copied to clipboard

All read-only mounts set through the CSI plugin have failed.

Open redriverhong opened this issue 4 months ago • 7 comments

Description

We uniformly configure read-only mounts for program areas of a specific type of volume through the CSI component, so ordinary services don't need to be aware that they are automatically mounted in read-only mode. However, due to the following changes, the read-only restriction has failed, and the mounted directories within the containers have become writable, posing a security risk. https://github.com/opencontainers/runc/releases/tag/v1.2.0-rc.1

Steps to reproduce the issue

Using an extended CSI component, the host directory /opt/data is mounted to the pod directory /var/lib/kubelet/pods/xxx/volumes/kubernetes.io~csi/pkg/mount. The mount parameters include the configuration of ro (read-only). In a regular pod configuration, the extended CSI is used to mount the directory to the /opt/test directory inside the container. This step is performed by runc. After entering the container, it is possible to create new files in the /opt/test directory.

Describe the results you received and expected

I expect that when entering the /opt/test directory in the container, it should be in read-only mode, preventing the creation or modification of files.

What version of runc are you using?

runc version 1.2.4 commit: v1.2.4-0-g6c52b3f spec: 1.2.0 go: go1.24.1 libseccomp: 2.5.0

Host OS information

NAME="EulerOS" VERSION="2.0 (SP13x86_64)" ID="euleros" VERSION_ID="2.0" PRETTY_NAME="EulerOS 2.0 (SP13x86_64)" ANSI_COLOR="0;31"

Host kernel information

Linux master1 5.10.0

redriverhong avatar Jul 31 '25 02:07 redriverhong

Can you please provide some kind of configuration or any other useful information so that we can actually understand what role runc is playing / what configuration is being used?

We did change how mount flags work in runc 1.2, but if you are explicitly setting ro for mounts in config.json we will absolutely honour that. The only thing that has changed in scenarios like you describe is that a user specifying rw will no longer be silently ignored if the original mount was ro (and some other conditions are met) -- instead you will get an rw mount because that is what was asked for. If you are explicitly setting ro the way you describe, this should work with runc 1.2.

cyphar avatar Jul 31 '25 02:07 cyphar

Below is the mount configuration for the business: volumeMounts: mountPath: /opt/pkgs name: mypkg volumes csi: driver: sop-csi-driver volumeAttributes: type: pkg name: mypkg

The sop-csi-driver CSI plugin is responsible for mounting the CSI volume of type pkg from the /opt/pkg directory on the host to the /var/lib/kubelet/xxx/csi/pkg/mount directory in the container. This layer of mounting is configured with the ro (read-only) method. Accessing the /var/lib/kubelet/xxx/csi/pkg/mount directory from the host remains read-only. However, it becomes writable once inside the container.

Image Image

Entering the corresponding directory in the container allows for file creation. Image

redriverhong avatar Jul 31 '25 03:07 redriverhong

@cyphar

redriverhong avatar Jul 31 '25 03:07 redriverhong

I don't know if Kubernetes auto-sets rw in the config they give us (in which case, that's a Kubernetes bug -- they should just omit rw or ro if the user did not request either) but surely there is a mount option to request it be read-only.

Another solution is to use the superblock read-only flag rather than the VFS one -- this is set with mount -o remount,ro which cannot be cleared with VFS mount flags. Though this will also make the mount read-only to all users (possibly even to new mounts of the same filesystem, since the kernel caches superblocks).

cyphar avatar Jul 31 '25 04:07 cyphar

I don't know if Kubernetes auto-sets rw in the config they give us (in which case, that's a Kubernetes bug -- they should just omit rw or ro if the user did not request either) but surely there is a mount option to request it be read-only.

Another solution is to use the superblock read-only flag rather than the VFS one -- this is set with mount -o remount,ro which cannot be cleared with VFS mount flags. Though this will also make the mount read-only to all users (possibly even to new mounts of the same filesystem, since the kernel caches superblocks).

In runc version 1.1.12, mounts inherited the original directory's mount mode (read-only), but this behavior was changed in version 1.2.4. I'd like to understand the main pain points this modification aimed to address.

redriverhong avatar Jul 31 '25 04:07 redriverhong

In runc version 1.1.12, mounts inherited the original directory's mount mode (read-only), but this behavior was changed in version 1.2.4. I'd like to understand the main pain points this modification aimed to address.

This is not an accurate description of runc's behaviour (neither before nor after runc 1.2). You can test it for yourself, outside of Kubernetes.

% runc version
runc version 1.3.0
commit: v1.3.0-0-g4ca628d1d4c9
spec: 1.2.1
go: go1.24.5
libseccomp: 2.6.0
% mkdir /tmp/dir
% mount --bind -o ro /tmp/dir /tmp/dir # vfs flag "ro" set
% cat bundle/config.json
{
	/* ... */
	"mounts": [
		/* ... */
		{
			"source": "/tmp/dir",
			"destination": "/dir-ro",
			"type": "bind",
			"options": [
				"bind", "ro"
			]
		},
		{
			"source": "/tmp/dir",
			"destination": "/dir-rw",
			"type": "bind",
			"options": [
				"bind", "rw"
			]
		},
		{
			"source": "/tmp/dir",
			"destination": "/dir-dfl",
			"type": "bind",
			"options": [
				"bind"
			]
		}
	}
	/* ... */
}
% runc run -b bundle ctr
# mount | grep dir-
tmpfs on /dir-ro type tmpfs (ro,size=31788528k,nr_inodes=1048576,inode64)
tmpfs on /dir-rw type tmpfs (rw,size=31788528k,nr_inodes=1048576,inode64)
tmpfs on /dir-dfl type tmpfs (ro,nosuid,nodev,size=31788528k,nr_inodes=1048576,inode64)
# 

/dir-dfl is what runc produces if you do not specify rw nor ro (it copies the flags of the host mount). If Kubernetes is auto-inserting rw to all mounts by default (something other tools like podman have done before), then you will see a behaviour change that looks similar to what your describing (previously we would silently ignore rw mount options which was the incorrect behaviour), but that would be a bug in Kubernetes, not runc.

The problems that were fixed (and the precise details of the behaviour changes) are described in the changelog:

 * Several aspects of how mount options work has been adjusted in a way that
   could theoretically break users that have very strange mount option strings.
   This was necessary to fix glaring issues in how mount options were being
   treated. The key changes are:

   - Mount options on bind-mounts that clear a mount flag are now always
     applied. Previously, if a user requested a bind-mount with only clearing
     options (such as `rw,exec,dev`) the options would be ignored and the
     original bind-mount options would be set. Unfortunately this also means
     that container configurations which specified only clearing mount options
     will now actually get what they asked for, which could break existing
     containers (though it seems unlikely that a user who requested a specific
     mount option would consider it "broken" to get the mount options they
     asked for). This also allows us to silently add locked mount flags the
     user *did not explicitly request to be cleared* in rootless mode,
     allowing for easier use of bind-mounts for rootless containers. (#3967)

   - Container configurations using bind-mounts with superblock mount flags
     (i.e. filesystem-specific mount flags, referred to as "data" in
     `mount(2)`, as opposed to VFS generic mount flags like `MS_NODEV`) will
     now return an error. This is because superblock mount flags will also
     affect the host mount (as the superblock is shared when bind-mounting),
     which is obviously not acceptable. Previously, these flags were silently
     ignored so this change simply tells users that runc cannot fulfil their
     request rather than just ignoring it. (#3990)

EDIT: There was also a separate bug fixed in #3967 that was related to us silently clearing ro flags when doing a mount, so runc 1.1.x actually had a bug that matches what you are describing -- which is why I said that your description of the behaviour is wrong for pre-1.2 runc (because we had bugs that could clear flags like ro silently) and post-1.2 (we do not silently clear flags anymore, nor do we auto-set rw in runc).

cyphar avatar Jul 31 '25 06:07 cyphar

Try to reproduce your issue with runc directly. If you cannot reproduce your issue outside of Kubernetes, please submit a bug to Kubernetes instead.

cyphar avatar Jul 31 '25 06:07 cyphar