Explore replacing opencontainers/runc with containerd/cgroups
Kubernetes use of opencontainers/runc as a library is placing undue burden on the runc team, for example:
- https://github.com/opencontainers/runc/issues/3028
- https://github.com/opencontainers/runc/issues/3221#issuecomment-925972992
We now have a cgroups specific library in containerd org that we can explore to start slowly replacing functionality we needed earlier from runc i think. https://github.com/containerd/cgroups
As of right now k/k master shows the following imports of opencontainers/runc:
❯ rg '"github.com/opencontainers/runc' | grep -v vendor | cut -f 2 -d '"' | sort | uniq -c | sort
1 github.com/opencontainers/runc/libcontainer/cgroups/systemd
1 github.com/opencontainers/runc/libcontainer/utils
2 github.com/opencontainers/runc/libcontainer/apparmor
2 github.com/opencontainers/runc/libcontainer/cgroups/manager
2 github.com/opencontainers/runc/libcontainer/configs
2 github.com/opencontainers/runc/libcontainer/userns
3 github.com/opencontainers/runc/libcontainer/cgroups/fscommon
17 github.com/opencontainers/runc/libcontainer/cgroups
/sig node
This issue is currently awaiting triage.
If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.
The triage/accepted label can be added by org members by writing /triage accepted in a comment.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.
/area code-organization
in both cases aren't we signing up a different project to support kubernetes and our needs/timelines? Obviously there's some overlap between containerd community and k8s but I don't see how moving to containerd package would solve this problem of k8s relying on a package outside of the k8s repo (and on contributors who may not have k8s as their priority/focus)
I personally think it's better to pull the public libcontainer pieces out of runc and have those versioned separately from runc, so we don't need to wait for runc verison bumps to have a tagged version of libcontainer.
in both cases aren't we signing up a different project to support kubernetes and our needs/timelines? Obviously there's some overlap between containerd community and k8s but I don't see how moving to containerd package would solve this problem of k8s relying on a package outside of the k8s repo (and on contributors who may not have k8s as their priority/focus)
Agree. Not sure why this would resolve the dependency issue.
(cc @samuelkarp for the suggestion on relying on containerd library)
Adding @kolyshkin as we have discussed this before.
I personally think it's better to pull the public libcontainer pieces out of runc and have those versioned separately from runc, so we don't need to wait for runc verison bumps to have a tagged version of libcontainer.
Not necessarily disagreeing, but ... who will maintain these? This may be duplicating effort given https://github.com/containerd/cgroups is already being maintained for public import as a standalone library.
My understanding is the runc team is actively attempting to stop supporting reuse of this code, so we'd need someone else to maintain the fork that re-exports them. If containerd/cgroups already accomplishes this then ... (e.g. https://github.com/containerd/cgroups/commit/6f4af8b641e84180ca1408fdf676373a4d90844b)
in both cases aren't we signing up a different project to support kubernetes and our needs/timelines? Obviously there's some overlap between containerd community and k8s but I don't see how moving to containerd package would solve this problem of k8s relying on a package outside of the k8s repo (and on contributors who may not have k8s as their priority/focus)
Our needs: yes, to an extent, but the difference is containerd/cgroups is a seperate public library instead of some packages from runc's implementation. It already has a distinct version (currently v3.0.3) and timeline from containerd/containerd.
cc @thaJeztah as a recently prolific committer to containerd/cgroups.
@BenTheElder yes, discussions have started in containerd project as well if they want to scope up or down to include what we may need.
Here's another option!
- Extract only files we use from runc : https://github.com/dims/libcontainer/
- update cadvisor with above: https://github.com/dims/cadvisor/tree/try-dims-libcontainer-repo
- then update k/k with new cadvisor and that library - https://github.com/kubernetes/kubernetes/pull/128200/files
We drop 20K lines of code and a bunch of tangled dependencies ... but then we own the library!
I like this approach :) @kolyshkin @AkihiroSuda @cyphar @lifubang @rata WDYT
@haircommander thanks for tagging us! I've just asked in the runc issue for opinions to other maintainers (https://github.com/opencontainers/runc/issues/3028#issuecomment-2426882847). Ideally, we should agree something and come-back with an opinion from the project (I'd like that, at least :)).
Just to understand, what is the context of this issue? Why do you want to remove the dependency on runc?
The use of runc as a binary vs runc as a library has been a journey, in addition to the above linked issues, please see those listed below as well. If you look at issues/PRs in opencontainers/runc you will see more of the history. So far the burden was put on runc folks by k8s without discussion. Here we are trying to see if there is a better way going forward.
- https://github.com/opencontainers/runc/issues/2471#issuecomment-1012724651
- https://github.com/opencontainers/runc/issues/3849#issuecomment-1544519250
- https://github.com/opencontainers/runc/pull/4436#issuecomment-2405829977
personally a motivating factor is decoupling new features in the underlying libraries from the release cadence of the runc binary. SIG node has been waiting for 1.2.0 for about a year to get PSI metrics. If we could factor out libcontainer from runc then we could get access to those new features without requiring runc also have a release
I am neutral as to whether kubernetes should switch to containerd/cgroups.
As for runc/libcontainer, let's take a look. Kubernetes only uses a few packages from github.com/opencontainers/runc/libcontainer, being:
cgroups(a handful of packages for fs-based and systemd-based cgroup v1 and v2 drivers);configs(needed for cgroups above);userns(should switch to github.com/moby/sys/userns; opened https://github.com/kubernetes/kubernetes/pull/128237);apparmor(the only function used isIsEnabled, which is rather trivial);utils(the only function used isCleanPath, which is rather trivial).
So, we're (mostly) talking about libcontainer/cgroups and libcontainer/configs here.
I am very much against forking of libcontainer, here's why. Making a fork is easy, maintaining a fork is hard (except when the original code is no longer maintained, which is not the case here). We'll need a team to port fixes from the original to the fork, and a team to port fixes from the fork to the original. We'll end up with two packages with two sets of bugs as a result. I doubt we (as a community) have resources for that.
What I think we should do is move libcontainer/cgroups out of runc and into a separate repo (under opencontainers I guess) which will have maintainers from both runc and kubernetes, and which will be consumed by both projects. This is problematic because of libcontainers/configs dependency, so the first step should be moving cgroup-related data structures from libcontainer/configs to libcontainers/cgroups. From the first glance, this is doable.
(The other problem was decoupling device management (needed by runc, not needed by other users) from cgroups managers, but this is already solved).
So, steps:
- Make runc/libcontainer/cgroups independent of runc/libcontainer/configs (and other runc stuff).
- Open a proposal to create cgroups repo under opencontainers org.
- Move github.com/opencontainers/runc/libcontainer/cgroups to github.com/opencontainers/cgroups.
- Use the above from runc and kubernetes.
I'd like to hear what people think and, if there's an agreement, file a bug to runc and start working on items in the list.
I like this plan! how would this work for you @dims
@haircommander @kolyshkin 😍 🫶 Love the plan. ➕1️⃣ from me for opencontainers/cgroups
@kolyshkin :+1: to that!
I am very much against forking of libcontainer, here's why. Making a fork is easy, maintaining a fork is hard (except when the original code is no longer maintained, which is not the case here). We'll need a team to port fixes from the original to the fork, and a team to port fixes from the fork to the original. We'll end up with two packages with two sets of bugs as a result. I doubt we (as a community) have resources for that.
Very much "+1" on that. Some of these packages capture a lot of expertise and knowledge gathered over the years, and creating a fork will be risky (things will diverge, and subtle changes in behavior will creep in). The devil may be in the detail for some of these packages, as I know there's been many esoteric "fun" issues that have been fixed that are not trivial to discover (things "seem to work" until they don't).
To some extent, the containerd/cgroups package is already an example of this; it was created to provide a "fresh" take on handling cgroups and to provide a cleaner API, but while runc's code definitely was getting a bit old in the tooth, it's also battle-tested. The runc code most definitely has had more attention to fixes for (obscure) corner-cases, where the containerd one sometimes fell behind.
My ideal would still be if somehow those projects could converge, so that effort could be shared (and there would be no need to choose between a "fresher" API or rougher API, but more battle-tested).
As to https://github.com/opencontainers/runc/issues/3028, I think a main motivator there is that runc (being a Go project from "the old days") has been too permissive on exporting things; too many things that were ultimately for "internal use" were exported, and thus a public API; not always well-architected for this purpose, which now means that any change needed for runc itself now could potentially break other projects. I've been "chipping away" smaller packages that were reasonably self-contained and stable to separate modules (github.com/moby/sys/user, github.com/moby/sys/userns). I guess the cgroups package could be considered as well; assuming we can get the public API "mostly stable", and if it doesn't complicate testing and integration back into runc too much.
+1 @kolyshkin, thank you! 🙏
I am very much against forking of libcontainer, here's why. Making a fork is easy, maintaining a fork is hard (except when the original code is no longer maintained, which is not the case here). We'll need a team to port fixes from the original to the fork, and a team to port fixes from the fork to the original. We'll end up with two packages with two sets of bugs as a result. I doubt we (as a community) have resources for that.
💯
As to https://github.com/opencontainers/runc/issues/3028, I think a main motivator there is that runc (being a Go project from "the old days") has been too permissive on exporting things; too many things that were ultimately for "internal use" were exported, and thus a public API; not always well-architected for this purpose, which now means that any change needed for runc itself now could potentially break other projects. I've been "chipping away" smaller packages that were reasonably self-contained and stable to separate modules (github.com/moby/sys/user, github.com/moby/sys/userns). I guess the cgroups package could be considered as well; assuming we can get the public API "mostly stable", and if it doesn't complicate testing and integration back into runc too much.
Kubernetes also has this problem, we've been telling people importing this repo (as opposed to the libraries "staged" out of it or developed externally outright) is ~unsupported, but haven't moved towards inernal/ yet. Personally I think communicating with internal/ is a good idea and if someone really wants that code they can fork ... but huge +1 to Kubernetes not doing that [edit: forking] and instead working with runc and the ecosystem.
I like the proposal and don't have any concerns. We discussed this at today's SIG Node weekly too, and no concerns raised from the meeting. Peter, representing SIG Node community, will send an email to kube-dev mailing list. cc/ @haircommander
Done in https://groups.google.com/g/kubernetes-sig-node/c/ce2yLXg5SDw :slightly_smiling_face:
Just as a note here, @kolyshkin we'll probably need a new Project proposal for a factored-out cgroups library like we have previously done for the go-digest and selinux libraries. I'm happy to facilitate the TOB vote on the proposal once it's ready.
FYI, doing some searches to figure out if we can totally drop containerd/cgroups from k/k and found that we will need some updates on the runc side:
https://github.com/opencontainers/runc/pull/4497
Now we wait for https://github.com/opencontainers/runc/pull/4472 to land first and then broken into a separate go.mod or a standalone library.
Just as a note here, @kolyshkin we'll probably need a new Project proposal for a factored-out cgroups library like we have previously done for the go-digest and selinux libraries. I'm happy to facilitate the TOB vote on the proposal once it's ready.
Opened https://github.com/opencontainers/tob/pull/144
xref: https://github.com/opencontainers/runc/pull/4472
next up is: https://github.com/opencontainers/runc/pull/4577
PR is open to populate opencontainers/cgroups: https://github.com/opencontainers/cgroups/pull/1
/close
fixed in https://github.com/kubernetes/kubernetes/pull/130569
@dims: Closing this issue.
In response to this:
/close
fixed in https://github.com/kubernetes/kubernetes/pull/130569
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.