pkg/cri: switch to CRI v1 API, add bridging for pre-v1 clients.
Switch to talking CRI v1 to the CRI runtime/server. Bridge any pre-v1 (only CRI v1alpha2) client requests, unless this is explicitly disabled on the command line.
Codecov Report
Merging #781 (a3c57f5) into master (6b458ca) will not change coverage. The diff coverage is
22.22%.
@@ Coverage Diff @@
## master #781 +/- ##
=======================================
Coverage 34.87% 34.87%
=======================================
Files 60 60
Lines 8874 8874
=======================================
Hits 3095 3095
Misses 5477 5477
Partials 302 302
| Impacted Files | Coverage Δ | |
|---|---|---|
| pkg/cri/resource-manager/cache/container.go | 17.33% <4.76%> (ø) |
|
| pkg/cri/resource-manager/cache/pod.go | 18.15% <33.33%> (ø) |
|
| pkg/cri/resource-manager/cache/cache.go | 22.01% <45.45%> (ø) |
|
| pkg/cri/resource-manager/cache/utils.go | 34.69% <100.00%> (ø) |
:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more
In general, I'm ok with all of this. One separate worry that I have is about "unsafe.Pointer()" cast for practically 100% of data structures: does match in all cases of we have something that might be problematic?
The CRI-O guys landed a few commits in the v1alpha2 CRI implementation to bring it struct-wise identical to v1 with the explicit goal of being able to do this type of near-zero-cost protocol bridging. The CRI-O implementation is essentially doing identical thing to what we do here, and that is the only reason why I dared to go with that instead of a convert-by-copying approach.
LGTM. Right now the option is enabled by default, is the plan to disable it by default or remove it, in the future?
Yes, that's exactly what I had in mind: first we have it as opt-out, once we're sure we can expect mostly everybody to talk v1 we make it opt-in, and eventually we remove it altogether.
Tested on debian-sid with
- kubelet v1.19.0 (CRI v1alpha2) + containerd v1.6.0-41-gc523102b5
- kubelet v1.23.4 (CRI v1) + containerd v1.6.0-41-gc523102b5 ...and it works fine.
Also tried out with containerd v1.4.0, that talks CRI v1alpha2. As promised in the commit message, this is not supposed to work. cri-resmgr output:
F0303 07:32:12.850610 4447 main.go:89] failed to start resource manager: resource-manager: cache synchronization pod query failed: rpc error: code = Unimplemented desc = unknown service runtime.v1.RuntimeServiceMy only question would be related to this. Merging the PR will prevent using CRI-RM master branch with older container runtime versions, which will make it harder to integrate CRI-RM to certain environments. On the other hand, without this PR, CRI-RM works with old and new kubelet and runtimes as our primary backend runtimes implement compatibility layers.
What do you think, should we add support for v1alpha2 towards CRI runtimes, too? (In this PR or in a separate PR?)
The initial idea was to support bridging in both directions. Then I quite quickly changed my mind regarding bridging from new clients to old runtimes. I think that is neither worth the effort nor necessarily very safe: I'm not sure if clients talking the new protocol version would expect talking to that old a runtime. Also looking at the official EOL chart, I think dropping support for 1.19/older should not be a problem...
If we consider keeping v1alpha2 support around for a longer time, then I think using a dedicated branch/per-release branches/binary/binaries for it could/would probably be the way to go. That branch/those branches would simply have one additional commit swapping v1 for v1alpha2.
WDYT?
@askervin @jukkar @kad @marquiz PTAL, I had to rebase this already approved one to resolve a conflict.
@kad, @marquiz, @askervin, @jukkar I consider merging this for the 0.7.0 release... WDYT ?
@kad, @marquiz, @askervin, @jukkar I consider merging this for the
0.7.0release... WDYT ?
Okay, after testing that idea against our current e2e test set, it's better to keep this one out for now. Many of our tests pull in old runtimes without support for the v1 protocol.
@kad @marquiz @askervin @jukkar @fmuyassarov Once we are 100 % sure that we don't want/need to support any containerd or cri-o version which does not speak CRI v1, this should be good to go in.
In our e2e test point of view... I didn't run full tests here, but checked that cri-resmgr can talk to distro's on-the-stock containerd:
distro=debian-sid: ok
distro=ubuntu-22.04: ok
distro=ubuntu-20.04: fail (containerd-1.5.9)
root@vm> cri-resmgr -relay-socket /var/run/cri-resmgr/cri-resmgr.sock -runtime-socket /var/run/containerd/containerd.sock -image-socket /var/run/containerd/containerd.sock -force-config cri-resmgr.cfg >cri-resmgr.output.txt 2>&1 &
cri-resmgr last output line:
F: [ cri-resmgr ] failed to start resource manager: resource-manager: failed to query runtime version: rpc error: code = Unimplemented desc = unknown service runtime.v1.RuntimeService
distro=centos-8: ok
distro=fedora: ok
ubuntu-20.04 end-of-life is Aug/2025. What do you think?
distro=ubuntu-20.04: fail (containerd-1.5.9) ... ubuntu-20.04 end-of-life is Aug/2025. What do you think?
I'm trying to become a better human being, so I can't/won't tell you that...
But I can see 3 choices here:
- We forget about this PR for now.
- We add dual-version bridging to the egress side (relay/server) as well.
- If we think there is a viable more recent alternative to 20.04, we stop supporting it despite it not being EOL'd.
3. If we think there is a viable more recent alternative to 20.04, we stop supporting it despite it not being EOL'd.
Hmm, please do not remove the support for 20.04 yet, I am using it in the VM with github actions.
- If we think there is a viable more recent alternative to 20.04, we stop supporting it despite it not being EOL'd.
Hmm, please do not remove the support for 20.04 yet, I am using it in the VM with github actions.
Okay... I wonder if we has this enabled for our repo:
/hold
- If we think there is a viable more recent alternative to 20.04, we stop supporting it despite it not being EOL'd.
Hmm, please do not remove the support for 20.04 yet, I am using it in the VM with github actions.
Okay... I wonder if we has this enabled for our repo:
/hold
Hmm. The robots I'm looking for are not these...
We'll need to pick this up again and get it eventually merged: v1alpha2 is going the way of the dodo bird.