Add containerd config in `OperatingSystemConfig` (I)
How to categorize this PR?
/area robustness /kind enhancement
What this PR does / why we need it:
This PR adds containerd config to the OperatingSystemConfig API, mainly to configure registries and the sandbox image.
Gardener, as well as extensions (e.g. gardener-extension-registry-cache), may contribute to this section of the API and gardener-node-agent takes necessary steps on the node.
With this enhancement, it is now more structured and easy to add or remove OCI registries while earlier every extension needed to bring their own logic for this (e.g. via additional systemd units).
While working on this topic, we noticed that the container-initializer systemd service can entirely be replaced by logic in the gardener-node-agent, which is performed in this PR along the way.
Which issue(s) this PR fixes: Part of #8929 Also see https://github.com/gardener-community/hackathon/tree/main/2024-05_Schelklingen#-type-safe-configurability-in-operatingsystemconfig-for-containerd-dns-ntp-etc.
Special notes for your reviewer: The groundwork happened at the last Gardener Hackathon (ref). As opposed to the procedure proposed in #8929, we decided against generic containerd patches mainly because:
- It's easier for Gardener and extensions to contribute to a well structured API.
- We needed to copy internal patching procedures from
kindto do the same. - We don't see a need for generic patches because the changes in this PR and the following (see description below) cover most of the use-cases.
It is planned to add more configuration options related to container runtime (see types_operatingsystemconfig.go diff here).
/cc @MrBatschner @Gerrit91 @ialidzhikov
Release note:
Additional containerd registries can directly be added to the `OperatingSystemConfig`. Once specified, `gardener-node-agent` takes care about necessary steps to configure the container runtime. The generic control plane ensurer was enhanced along the way to simplify mutating the relevant part of the `OperatingSystemConfig` ((ref)[https://github.com/timuthy/gardener/blob/74390628e336b0954b3983ea8f4e0519b1ee5c0d/extensions/pkg/webhook/controlplane/genericmutator/mutator.go#L88]).
The `containerd-intializer` systemd unit has been deprecated and will remain functionless on existing nodes for compatibility reasons.
@timuthy: GitHub didn't allow me to request PR reviews from the following users: Gerrit91.
Note that only gardener members and repo collaborators can review this PR, and authors cannot review their own PRs.
In response to this:
How to categorize this PR?
/area robustness /kind enhancement
What this PR does / why we need it: This PR adds
containerdconfig to theOperatingSystemConfigAPI, mainly to configure registries and the sandbox image. Gardener, as well as extensions (e.g. gardener-extension-registry-cache), may contribute to this section of the API andgardener-node-agenttakes necessary steps on the node. With this enhancement, it is now more structured and easy to add or remove OCI registries while earlier every extension needed to bring their own logic for this (e.g. via additional systemd units).While working on this topic, we noticed that the
container-initializersystemd service can entirely be replaced by logic in thegardener-node-agent, which is performed in this PR along the way.Which issue(s) this PR fixes: Part of #8929 Also see https://github.com/gardener-community/hackathon/tree/main/2024-05_Schelklingen#-type-safe-configurability-in-operatingsystemconfig-for-containerd-dns-ntp-etc.
Special notes for your reviewer: The groundwork happened at the last Gardener Hackathon (ref). As opposed to the procedure proposed in #8929, we decided against generic containerd patches mainly because:
- It's easier for Gardener and extensions to contribute to a well structured API.
- We needed to copy internal patching procedures from
kindto do the same.- We don't see a need for generic patches because the changes in this PR and the following (see description below) cover most of the use-cases.
It is planned to add more configuration options related to container runtime (see
types_operatingsystemconfig.godiff here)./cc @MrBatschner @Gerrit91 @ialidzhikov
Release note:
Additional containerd registries can directly be added to the `OperatingSystemConfig`. Once specified, `gardener-node-agent` takes care about necessary steps to configure the container runtime. The generic control plane ensurer was enhanced along the way to simplify mutating the relevant part of the `OperatingSystemConfig` ((ref)[https://github.com/timuthy/gardener/blob/74390628e336b0954b3983ea8f4e0519b1ee5c0d/extensions/pkg/webhook/controlplane/genericmutator/mutator.go#L88]).The `containerd-intializer` systemd unit has been deprecated and will remain functionless on existing nodes for compatibility reasons.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.
/test pull-gardener-e2e-kind
/assign
/test pull-gardener-e2e-kind
/test pull-gardener-e2e-kind-ha-single-zone-upgrade
/test pull-gardener-e2e-kind-ipv6
/test pull-gardener-e2e-kind
/lgtm
LGTM label has been added.
/retest-required
@dimitar-kostadinov, as you already have the required adaptations for the registry-cache extension, can you double check that the latest revision of the PR satisfies the requirements from the registry-cache side? Maybe running our e2e tests locally would be enough.
The e2e test should create Shoot with registry-cache extension enabled with caches for Shoot system components, delete Shoot fails with following node-agent error:
Jul 10 11:02:04 machine-shoot--local--e2e-cache-ssc-local-68ccb-bdf6w gardener-node-agent[333]: {"level":"error","ts":"2024-07-10T11:02:04.594Z","msg":"Reconciler error","controller":"operatingsystemconfig","namespace":"kube-system","name":"gardener-node-agent-local-ee46034b8269353b","reconcileID":"a9021fe9-e4a8-47d6-8a4f-c996eb660c22","error":"failed reconciling containerd registries: retry failed with context deadline exceeded, last error: failed to reach registry http://10.4.219.36:5000 for upstream registry.k8s.io: Get \"http://10.4.219.36:5000\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\tsigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:329\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\tsigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\tsigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:227"}
The pods in kube-system namespace are Pending status and the machine pod is not ready:
% k -n kube-system get po -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system calico-kube-controllers-56f86cc77b-92dtc 0/1 Pending 0 29m
kube-system calico-node-vertical-autoscaler-6cf9c8bf8d-w6qjr 0/1 Pending 0 29m
kube-system calico-typha-deploy-5f6c5b7db-hdljm 0/1 Pending 0 29m
kube-system calico-typha-horizontal-autoscaler-78ff9f78d8-hftwn 0/1 Pending 0 29m
kube-system calico-typha-vertical-autoscaler-7c87c474d6-2swcq 0/1 Pending 0 29m
...
% k -n shoot--local--e2e-cache-ssc get po machine-shoot--local--e2e-cache-ssc-local-68ccb-bdf6w
NAME READY STATUS RESTARTS AGE
machine-shoot--local--e2e-cache-ssc-local-68ccb-bdf6w 0/1 Running 0 10m
All other e2e tests pass OK.
/test pull-gardener-integration
Thanks again for the thorough review and feedback @ialidzhikov and @dimitar-kostadinov. PTAL.
LGTM label has been added.
/approve
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: dimitar-kostadinov, timuthy
The full list of commands accepted by this bot can be found here.
The pull request process is described here
- ~~OWNERS~~ [timuthy]
Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment
/test pull-gardener-e2e-kind