uet icon indicating copy to clipboard operation
uet copied to clipboard

Update the Kubernetes components installed by RKM

Open hach-que opened this issue 8 months ago • 1 comments

  • [ ] Adapt the Helm charts for CoreDNS and Calico on Linux to a custom Helm chart, and install/upgrade them automatically when the controller starts up. We have to modify them as we need to override some host paths (such as log paths for Calico) to be per RKM install.
  • [ ] Create a new Helm chart that can encapsulate CoreDNS and Calico on Windows, and install/upgrade that chart automatically when the controller starts up.
  • [ ] Upgrade Kubernetes components from 1.26.1 to 1.33.
  • [ ] Upgrade containerd from 1.6.18 to 2.0.0.
    • [ ] Make sure Windows Server containers can run on Windows 11, or carry over our previous patches from 1.6.18 that allowed this to work. See https://github.com/containerd/containerd/pull/8137. I suspect if we upgrade our build machines to 24H2 (from 21H2), that the build compatibility may "just work", but it depends on whether containerd still has strict kernel version checks.
    • [ ] Make sure that UEFS continues to work in host process containers after the upgrade. Refer to https://github.com/microsoft/hcsshim/issues/1699 for details on the regression. Some commenters report that everything is fine on later Windows versions, so again this just needs testing.
  • [x] Upgrade runc for Linux to the latest version.
  • [x] Upgrade etcd for Linux to the latest version. Care must be taken here to make sure there's no migration/upgrade steps needed for the database coming from 3.5.7.
  • [ ] Upgrade Calico to the latest version. This may or may not be done as part of the Helm charts task above.
  • [ ] Upgrade Windows Container Networking to the latest version.
  • [x] Upgrade CoreDNS to the latest version. This may or may not be done as part of the Helm charts task above.

hach-que avatar May 19 '25 10:05 hach-que

The Windows Container Networking CNI plugins don't seem to have good release practices, so I don't think we should blindly update from 0.2.0 to 0.3.1. In addition, there's not even a linear date->version association, with v0.2.2 being technically released after 0.3.1.

We should probably build the CNI plugins ourselves from the source code repository, and bundle them together with Calico into a host process container image that we can deploy to Windows nodes as a daemon set (so RKM does not have to manually manage the Calico process on Windows).

hach-que avatar May 19 '25 16:05 hach-que