containerd
containerd copied to clipboard
[release/1.7] [RFC] Backport userns support so it works with all k8s versions
This is a RFC PR, after today's community meeting dicussion and further discussions with @mikebrow on slack. I've tested this works fine with the changes in k8s 1.30.
The idea is to show the gist of how backporting this would look like. Also, note as this is a RFC, the tests have not been backported (it should be simple, but as the file structure changed, it needs a little bit more adjusting).
Also, I have not backported this for the sbserver (do we want that?).
Tha changes excluding the vendors, go.mod and go.sum files are this:
$ git diff --stat origin/release/1.7 !(vendor|go.sum|go.mod|integration)
pkg/cri/instrument/instrumented_service.go | 4 ++++
pkg/cri/opts/spec_linux_opts.go | 24 ++++++++++++++++++++++--
pkg/cri/sbserver/runtime_config.go | 13 +++++++++++++
pkg/cri/server/container_create.go | 15 +++++++++++++--
pkg/cri/server/container_create_linux.go | 22 ++++++++++++++++++++++
pkg/cri/server/container_create_test.go | 3 ++-
pkg/cri/server/runtime_config.go | 13 +++++++++++++
pkg/cri/server/status.go | 31 +++++++++++++++++++++++++++++++
pkg/process/init.go | 58 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
pkg/process/utils.go | 24 ++++++++++++++++++++++++
sys/reaper/reaper_unix.go | 9 +++++++++
11 files changed, 211 insertions(+), 5 deletions(-)
PS: tests are failing due to some files I created lack the header. Those kind of details are missing, I think this is enough to show the gist of what we need to backport anyways. I can fix those, of course, if we want to move forward with this.
cc @dmcgowan @mikebrow
On the Kubernetes side, we have done some changes that require changes in the runtimes to support user namespaces. One in 1.27, to use idmap mounts for all the mounts; and again in 1.30, to expose the via the Status rpc if the runtime support user namespaces or not.
The current situation in containerd 1.7 is that it only works with userns with kubernetes 1.25 or 1.26.
This PR backports the changes from containerd main to the 1.7 branch, so userns works with any kubernetes version >= 1.25 (when user namespaces was merged).
The dependencies that we have to upgrade to do it are:
- cri-api: To use the new fields in the Status RPC (see changes in status.go)
- runtime-spec: The fields to request the OCI runtime to use idmap mounts have been added in a later version that we currently use
- go-runc: To query the OCI runtimes features ("runc features") we needed to upgrade it too.
Please note that containerd main has other features that are nice to have when using user namespaces, like idmap mounts support for the rootfs. We can consider backporting that in the future if we want to, but that is a more invasive backport. But that was never available in 1.7.
Also, this makes user namespaces work with runc-compatible runtimes (kata doesn't really need this anyways). The reason is explained in pkg/cri/server/status.go and pkg/process/init.go. The tl;dr is: (a) we need to make sure the runtime supports idmap mounts (otherwise the UID/GID in volumes will be garbage), (b) OCI runtime-spec mandates to ignore unknown features, (c) we are only enforcing the check for runc-compatible runtimes.
Hi @rata. Thanks for your PR.
I'm waiting for a containerd member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.
Once the patch is verified, the new status will be reflected by the ok-to-test label.
I understand the commands that are listed here.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
Also, I have not backported this for the sbserver (do we want that?).
yes, I think so
PR needs rebase.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
I'll rebase whenever there is agreement on the community to pursue this. Meanwhile, so people can see how would a backport look like so we decide if we want this or not, there is no point to rebase IMHO.
This PR is stale because it has been open 90 days with no activity. This PR will be closed in 7 days unless new comments are made or the stale label is removed.
@github-actions don't close this. This is probably low prio with the 2.0 release, but I'd like people to take a look whenever they can, so we decide if we will do the backport or not