containerd
containerd copied to clipboard
NRI: add support for NRI with extended scope.
This PR is one in a set of 3 PRs, that propose and implement a prototype to extend the scope of NRI to enable common, pluggable runtime extensions. The other related PRs in this set are:
- extend the scope of current NRI
- cri-o integration of the above
Please refer to the NRI PR for a general description of what this is all about.
Hi @klihub. Thanks for your PR.
I'm waiting for a containerd member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test
on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.
Once the patch is verified, the new status will be reflected by the ok-to-test
label.
I understand the commands that are listed here.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
Merge Failed.
This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset.
Merge Failed.
This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset.
Looks like this was opened from a branch that was outdated; can you try rebasing your branch to get rid of the unrelated commits?
Looks like this was opened from a branch that was outdated; can you try rebasing your branch to get rid of the unrelated commits?
Rebased on top of main.
Build succeeded.
- containerd-build-arm64 : SUCCESS in 4m 18s (non-voting)
- containerd-test-arm64 : FAILURE in 6m 08s (non-voting)
- containerd-integration-test-arm64 : SUCCESS in 21m 03s (non-voting)
Merge Failed.
This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset.
Build succeeded.
- containerd-build-arm64 : SUCCESS in 5m 01s (non-voting)
- containerd-test-arm64 : SUCCESS in 5m 42s (non-voting)
- containerd-integration-test-arm64 : SUCCESS in 24m 27s (non-voting)
Build succeeded.
- containerd-build-arm64 : SUCCESS in 4m 54s (non-voting)
- containerd-test-arm64 : SUCCESS in 6m 13s (non-voting)
- containerd-integration-test-arm64 : SUCCESS in 25m 19s (non-voting)
Build succeeded.
- containerd-build-arm64 : SUCCESS in 4m 25s (non-voting)
- containerd-test-arm64 : SUCCESS in 6m 24s (non-voting)
- containerd-integration-test-arm64 : SUCCESS in 26m 09s (non-voting)
Sorry for so late reply on this pull request.
I like that idea - TTRPC-izes NRI, even if NRI is about binary call. Currently, the containerd supports NRI hook in RunPodSandbox and StartContainer. The number of NRI calls for one pod is more than CNI. In some cases, like resource overcommitted work node, the fork-process pattern might have performance issue. TTRPC-ized NRI can improve that.
But for stub of CRI, I have questions:
Is the stub used to maintain the state of container?
containerd has integrated NRI with Delete
state. Basically, the stateful
daemon process will receive Creation
and Delete
state from NRI calls to
keep consistent with containerd state.
And currently, with common OCI-Runtime-Spec implementations, the container
process will be setup in two phases: fork(runc-create) and exec(runc start).
containerd setups the stub between two phases, which can make sure that the
resource policy can be applied to container(cgroup) in time. But like ApplyPendingUpdates
and PostCreateContainer
stubs, I would like to know what kind of case it is.
If it is just a wrapper of CRI, maybe we can consider to use gRPC middleware
to export proxy plugin functionality.
Thanks,
Sorry for so late reply on this pull request.
No problem. And thanks for the extra effort of taking a look at this. I really appreciate it !
I like that idea - TTRPC-izes NRI, even if NRI is about binary call. Currently, the containerd supports NRI hook in RunPodSandbox and StartContainer. The number of NRI calls for one pod is more than CNI. In some cases, like resource overcommitted work node, the fork-process pattern might have performance issue. TTRPC-ized NRI can improve that.
But for stub of CRI, I have questions:
One thing related to this. Based on our discussion with @mikebrow, we would like to eventually extend our integration/implementation to also handle containers created using other 'frontends' protocols than just CRI (ctr and docker in particular). We rolled our initial implementation this way, partly because the current NRI integration is NRI-only and partly because we'd like to better understand all the implications and problems related to extending it to the rest.
Is the stub used to maintain the state of container?
I assume here that by 'stub of CRI' you mean the code added in this patch set to the CRI plugin for triggering NRI processing and altering containers accordingly. Please let me know if this assumption is wrong.
That stub, the new NRI-related additions/adaptation, does not maintain state of the containers on its own. It basically does the following things:
- state synchronization with NRI
- hook NRI into (CRI) request processing/pod and container lifecycle events
- handle NRI-triggered updates to containers
State synchronization lets NRI know about all existing pods and containers on startup. Additionally, it allows any NRI plugin that establishes a connection to NRI dynamically to synchronize itself. For request processing, the stub gathers all the necessary data for plugins, makes the call to NRI, then takes the response and applies the necessary changes to containers, also making sure that the changes are applied in the correct order. There are slight differences in how containerd and cri-o mangle and store some of the data carried by CRI into an OCI container, for instance the details of how labels and annotations get converted (if at all) to OCI annotations. We want to hide these differences as much as possible from NRI plugins, so we need to have some extra processing/logic for this in the runtime NRI adaptation.
containerd has integrated NRI with
Delete
state. Basically, the stateful daemon process will receiveCreation
andDelete
state from NRI calls to keep consistent with containerd state.
Yes, I've tried to follow the same pattern. When containers are stopped, hooking in NRI processing is done in a similar fashion as deletion is handled for the original NRI plugins.
And currently, with common OCI-Runtime-Spec implementations, the container process will be setup in two phases: fork(runc-create) and exec(runc start). containerd setups the stub between two phases, which can make sure that the resource policy can be applied to container(cgroup) in time. But like
ApplyPendingUpdates
andPostCreateContainer
stubs, I would like to know what kind of case it is.
ApplyPendingUpdates
applies any updates which might have occured after NRI itself has processed the create request (at which point the container being created became visible to NRI plugins) but before the creation of the container has been fully processed by containerd/the runtime itself.
The primary purpose of Post* stub calls is to allow a plugin to perform any extra action outside of the realm of control of OCI if necessary. Additionally, these provide a mechanism for a plugin to acquire the final state a container ended up in, once it has been processed by all NRI plugins.
If it is just a wrapper of CRI, maybe we can consider to use gRPC middleware to export proxy plugin functionality.
They are not just pure wrappers. Many of them, at least the ones that can result in modifications to containers, need to happen at a particular point during the processing of the corresponding CRI request. So I think these can't really be implemented as a pure add-on that would happen in addition to opaque processing of the related CRI request. I guess those requests that can't result in container modifications could in principle be implemented like that (pod-related ones and the Post* ones). But even for those I'm not sure if digging out all the necessary information for hooking in NRI would be really possible from 'outside' of the CRI plugin.
Thanks,
If I misunderstood some aspect of your questions, please let me know.
@fuweid PTAL, if you have some time.
fyi.. we are currently adding event notifications to kubelet for container/pod state changes.. so we can reduce (make much longer) the pod sync status cycle, and make the state changes known to kubelet much much faster than is done today (once every 2min or with an expensive probe). see: https://github.com/containerd/containerd/pull/7073
fyi.. we are currently adding event notifications to kubelet for container/pod state changes.. so we can reduce (make much longer) the pod sync status cycle, and make the state changes known to kubelet much much faster than is done today (once every 2min or with an expensive probe). see: #7073
Ok, thanks. I need to take a closer look. Functionality looks orthogonal to NRI, although it does seem/need to touch around the same spots in the code/CRI plugin as we touch for NRI integration (and I'm guessing this same observation triggered your comment).
/ok-to-test
@klihub note:
<<<<<<< pr/proto/nri
golang.org/x/tools v0.1.10/go.mod h1:Uh6Zz+xoGYZom868N8YTex3t7RhtHDBrE8Gzo9bV56E=
=======
golang.org/x/tools v0.1.12 h1:VveCTK38A2rkS8ZqFY25HIDFscX5X9OoEhJd3quQmXU=
golang.org/x/tools v0.1.12/go.mod h1:hNGJHUnrk76NpqgfD5Aqm5Crs+Hm0VOH/i9J2+nxYbc=
>>>>>>> main
let's get it in main check it out evaluate risk then consider for LTS after...
@klihub note:
<<<<<<< pr/proto/nri golang.org/x/tools v0.1.10/go.mod h1:Uh6Zz+xoGYZom868N8YTex3t7RhtHDBrE8Gzo9bV56E= ======= golang.org/x/tools v0.1.12 h1:VveCTK38A2rkS8ZqFY25HIDFscX5X9OoEhJd3quQmXU= golang.org/x/tools v0.1.12/go.mod h1:hNGJHUnrk76NpqgfD5Aqm5Crs+Hm0VOH/i9J2+nxYbc= >>>>>>> main
@mikebrow Sorry about that... I resolved it. Could you kick the tests again ? AFAICT the failed test is a flake and has not much to do with the PR changes.
/retest-required
/test pull-containerd-node-e2e
/test pull-containerd-node-e2e
/retest-required
/retest
Marking this draft since I found a few problems I need to sort out before any merge. Will mark it ready again once I'm done with those.
on bringing down the logging plugin I get an error:
ERRO[2022-11-22T11:57:26.541929641-06:00] error receiving message error="failed to read header from trunk: EOF" INFO[2022-11-22T11:57:26.541953042-06:00] connection to plugin "00-logger" closed INFO[2022-11-22T11:57:26.542130540-06:00] ttrpc server for plugin "00-logger" closed (ttrpc: server closed)
TODO: is there a way to eat that error message and just close?
Yes. I updated it and now if you bring down the plugin it's logged like this:
INFO[2022-11-22T21:08:39.727035268Z] plugin "00-logger" connected
...
INFO[2022-11-22T21:08:41.055532101Z] connection to plugin "00-logger" closed
- TODO before release: having two configs to enable nri is confusing /etc/nri/nri.conf vs containerd/config.toml
I can change the initialization code in NRI itself so that the lack of a configuration file is treated as an empty configuration (file) instead of an initialization error (which then results in NRI getting disabled). With that change, NRI could be enabled by simply enabling it in the containerd config.
Removing backport label, added in https://github.com/containerd/containerd/pull/7954