containerd NRI: add support for NRI with extended scope.

trafficstars

This PR is one in a set of 3 PRs, that propose and implement a prototype to extend the scope of NRI to enable common, pluggable runtime extensions. The other related PRs in this set are:

extend the scope of current NRI
cri-o integration of the above

Please refer to the NRI PR for a general description of what this is all about.

Sep 16 '21 14:09 klihub

Hi @klihub. Thanks for your PR.

I'm waiting for a containerd member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sep 16 '21 14:09 k8s-ci-robot

Merge Failed.

This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset.

Sep 16 '21 14:09 theopenlab-ci[bot]

Merge Failed.

This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset.

Sep 16 '21 15:09 theopenlab-ci[bot]

Looks like this was opened from a branch that was outdated; can you try rebasing your branch to get rid of the unrelated commits?

Sep 16 '21 16:09 thaJeztah

Looks like this was opened from a branch that was outdated; can you try rebasing your branch to get rid of the unrelated commits?

Rebased on top of main.

Sep 16 '21 18:09 klihub

Build succeeded.

containerd-build-arm64 : SUCCESS in 4m 18s (non-voting)
containerd-test-arm64 : FAILURE in 6m 08s (non-voting)
containerd-integration-test-arm64 : SUCCESS in 21m 03s (non-voting)

Sep 16 '21 18:09 theopenlab-ci[bot]

Merge Failed.

This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset.

Dec 09 '21 15:12 theopenlab-ci[bot]

Build succeeded.

containerd-build-arm64 : SUCCESS in 5m 01s (non-voting)
containerd-test-arm64 : SUCCESS in 5m 42s (non-voting)
containerd-integration-test-arm64 : SUCCESS in 24m 27s (non-voting)

Dec 09 '21 16:12 theopenlab-ci[bot]

Build succeeded.

containerd-build-arm64 : SUCCESS in 4m 54s (non-voting)
containerd-test-arm64 : SUCCESS in 6m 13s (non-voting)
containerd-integration-test-arm64 : SUCCESS in 25m 19s (non-voting)

Dec 17 '21 14:12 theopenlab-ci[bot]

Build succeeded.

containerd-build-arm64 : SUCCESS in 4m 25s (non-voting)
containerd-test-arm64 : SUCCESS in 6m 24s (non-voting)
containerd-integration-test-arm64 : SUCCESS in 26m 09s (non-voting)

Dec 17 '21 14:12 theopenlab-ci[bot]

Sorry for so late reply on this pull request.

I like that idea - TTRPC-izes NRI, even if NRI is about binary call. Currently, the containerd supports NRI hook in RunPodSandbox and StartContainer. The number of NRI calls for one pod is more than CNI. In some cases, like resource overcommitted work node, the fork-process pattern might have performance issue. TTRPC-ized NRI can improve that.

But for stub of CRI, I have questions:

Is the stub used to maintain the state of container?

containerd has integrated NRI with Delete state. Basically, the stateful daemon process will receive Creation and Delete state from NRI calls to keep consistent with containerd state.

And currently, with common OCI-Runtime-Spec implementations, the container process will be setup in two phases: fork(runc-create) and exec(runc start). containerd setups the stub between two phases, which can make sure that the resource policy can be applied to container(cgroup) in time. But like ApplyPendingUpdates and PostCreateContainer stubs, I would like to know what kind of case it is. If it is just a wrapper of CRI, maybe we can consider to use gRPC middleware to export proxy plugin functionality.

Thanks,

Dec 18 '21 07:12 fuweid

Sorry for so late reply on this pull request.

No problem. And thanks for the extra effort of taking a look at this. I really appreciate it !

I like that idea - TTRPC-izes NRI, even if NRI is about binary call. Currently, the containerd supports NRI hook in RunPodSandbox and StartContainer. The number of NRI calls for one pod is more than CNI. In some cases, like resource overcommitted work node, the fork-process pattern might have performance issue. TTRPC-ized NRI can improve that.

But for stub of CRI, I have questions:

One thing related to this. Based on our discussion with @mikebrow, we would like to eventually extend our integration/implementation to also handle containers created using other 'frontends' protocols than just CRI (ctr and docker in particular). We rolled our initial implementation this way, partly because the current NRI integration is NRI-only and partly because we'd like to better understand all the implications and problems related to extending it to the rest.

Is the stub used to maintain the state of container?

I assume here that by 'stub of CRI' you mean the code added in this patch set to the CRI plugin for triggering NRI processing and altering containers accordingly. Please let me know if this assumption is wrong.

That stub, the new NRI-related additions/adaptation, does not maintain state of the containers on its own. It basically does the following things:

state synchronization with NRI
hook NRI into (CRI) request processing/pod and container lifecycle events
handle NRI-triggered updates to containers

State synchronization lets NRI know about all existing pods and containers on startup. Additionally, it allows any NRI plugin that establishes a connection to NRI dynamically to synchronize itself. For request processing, the stub gathers all the necessary data for plugins, makes the call to NRI, then takes the response and applies the necessary changes to containers, also making sure that the changes are applied in the correct order. There are slight differences in how containerd and cri-o mangle and store some of the data carried by CRI into an OCI container, for instance the details of how labels and annotations get converted (if at all) to OCI annotations. We want to hide these differences as much as possible from NRI plugins, so we need to have some extra processing/logic for this in the runtime NRI adaptation.

containerd has integrated NRI with Delete state. Basically, the stateful daemon process will receive Creation and Delete state from NRI calls to keep consistent with containerd state.

Yes, I've tried to follow the same pattern. When containers are stopped, hooking in NRI processing is done in a similar fashion as deletion is handled for the original NRI plugins.

And currently, with common OCI-Runtime-Spec implementations, the container process will be setup in two phases: fork(runc-create) and exec(runc start). containerd setups the stub between two phases, which can make sure that the resource policy can be applied to container(cgroup) in time. But like ApplyPendingUpdates and PostCreateContainer stubs, I would like to know what kind of case it is.

ApplyPendingUpdates applies any updates which might have occured after NRI itself has processed the create request (at which point the container being created became visible to NRI plugins) but before the creation of the container has been fully processed by containerd/the runtime itself.

The primary purpose of Post* stub calls is to allow a plugin to perform any extra action outside of the realm of control of OCI if necessary. Additionally, these provide a mechanism for a plugin to acquire the final state a container ended up in, once it has been processed by all NRI plugins.

If it is just a wrapper of CRI, maybe we can consider to use gRPC middleware to export proxy plugin functionality.

They are not just pure wrappers. Many of them, at least the ones that can result in modifications to containers, need to happen at a particular point during the processing of the corresponding CRI request. So I think these can't really be implemented as a pure add-on that would happen in addition to opaque processing of the related CRI request. I guess those requests that can't result in container modifications could in principle be implemented like that (pod-related ones and the Post* ones). But even for those I'm not sure if digging out all the necessary information for hooking in NRI would be really possible from 'outside' of the CRI plugin.

Thanks,

If I misunderstood some aspect of your questions, please let me know.

Dec 21 '21 09:12 klihub

@fuweid PTAL, if you have some time.

May 15 '22 06:05 klihub

fyi.. we are currently adding event notifications to kubelet for container/pod state changes.. so we can reduce (make much longer) the pod sync status cycle, and make the state changes known to kubelet much much faster than is done today (once every 2min or with an expensive probe). see: https://github.com/containerd/containerd/pull/7073

Jul 07 '22 14:07 mikebrow

fyi.. we are currently adding event notifications to kubelet for container/pod state changes.. so we can reduce (make much longer) the pod sync status cycle, and make the state changes known to kubelet much much faster than is done today (once every 2min or with an expensive probe). see: #7073

Ok, thanks. I need to take a closer look. Functionality looks orthogonal to NRI, although it does seem/need to touch around the same spots in the code/CRI plugin as we touch for NRI integration (and I'm guessing this same observation triggered your comment).

Jul 07 '22 15:07 klihub

/ok-to-test

Oct 05 '22 14:10 mikebrow

@klihub note:

<<<<<<< pr/proto/nri
golang.org/x/tools v0.1.10/go.mod h1:Uh6Zz+xoGYZom868N8YTex3t7RhtHDBrE8Gzo9bV56E=
=======
golang.org/x/tools v0.1.12 h1:VveCTK38A2rkS8ZqFY25HIDFscX5X9OoEhJd3quQmXU=
golang.org/x/tools v0.1.12/go.mod h1:hNGJHUnrk76NpqgfD5Aqm5Crs+Hm0VOH/i9J2+nxYbc=
>>>>>>> main

Oct 05 '22 14:10 mikebrow

let's get it in main check it out evaluate risk then consider for LTS after...

Oct 05 '22 14:10 mikebrow

@klihub note:

<<<<<<< pr/proto/nri
golang.org/x/tools v0.1.10/go.mod h1:Uh6Zz+xoGYZom868N8YTex3t7RhtHDBrE8Gzo9bV56E=
=======
golang.org/x/tools v0.1.12 h1:VveCTK38A2rkS8ZqFY25HIDFscX5X9OoEhJd3quQmXU=
golang.org/x/tools v0.1.12/go.mod h1:hNGJHUnrk76NpqgfD5Aqm5Crs+Hm0VOH/i9J2+nxYbc=
>>>>>>> main

@mikebrow Sorry about that... I resolved it. Could you kick the tests again ? AFAICT the failed test is a flake and has not much to do with the PR changes.

Oct 05 '22 15:10 klihub

/retest-required

Oct 19 '22 07:10 klihub

/test pull-containerd-node-e2e

Oct 19 '22 07:10 klihub

/test pull-containerd-node-e2e

Oct 19 '22 11:10 klihub

/retest-required

Oct 19 '22 14:10 klihub

/retest

Oct 20 '22 02:10 samuelkarp

Marking this draft since I found a few problems I need to sort out before any merge. Will mark it ready again once I'm done with those.

Oct 31 '22 19:10 klihub

on bringing down the logging plugin I get an error:

ERRO[2022-11-22T11:57:26.541929641-06:00] error receiving message                       error="failed to read header from trunk: EOF"
INFO[2022-11-22T11:57:26.541953042-06:00] connection to plugin "00-logger" closed      
INFO[2022-11-22T11:57:26.542130540-06:00] ttrpc server for plugin "00-logger" closed (ttrpc: server closed)

TODO: is there a way to eat that error message and just close?

Yes. I updated it and now if you bring down the plugin it's logged like this:

INFO[2022-11-22T21:08:39.727035268Z] plugin "00-logger" connected
...
INFO[2022-11-22T21:08:41.055532101Z] connection to plugin "00-logger" closed

Nov 22 '22 21:11 klihub

TODO before release: having two configs to enable nri is confusing /etc/nri/nri.conf vs containerd/config.toml

I can change the initialization code in NRI itself so that the lack of a configuration file is treated as an empty configuration (file) instead of an initialization error (which then results in NRI getting disabled). With that change, NRI could be enabled by simply enabling it in the containerd config.

Nov 22 '22 22:11 klihub

Removing backport label, added in https://github.com/containerd/containerd/pull/7954

Apr 07 '23 22:04 mxpv

containerd containerd copied to clipboard

NRI: add support for NRI with extended scope.

containerd
containerd copied to clipboard