feature: rebase to Kubernetes 1.32.0
Feature Description
Kubernetes 1.32.0 is slated to come out tomorrow. We should rebase kcp to that new version. A final changelog isn't out yet, so it likely makes sense to wait for that to understand what changes to expect from the rebase.
Proposed Solution
Follow the rebase guide.
Alternative Solutions
No response
Want to contribute?
- [ ] I would like to work on this issue.
Additional Context
No response
/assign gman0
@embik: GitHub didn't allow me to assign the following users: gman0.
Note that only kcp-dev members with read permissions, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time. For more information please see the contributor guide
In response to this:
/assign gman0
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
Taking (I would self-/assign but that probably won't work for me at the moment because of perms).
/assign gman0
WIP branches for kcp-dev/kubernetes:
- https://github.com/gman0/kubernetes/tree/kcp-1.32-baseline
- https://github.com/gman0/kubernetes/tree/kcp-1.32-pre.1-deps
- https://github.com/gman0/kubernetes/tree/kcp-1.32-pre.2-fixes
WIP branch for kcp-dev/kcp:
- https://github.com/gman0/kcp/tree/1.32-bump
- https://github.com/gman0/kcp/tree/1.32-bump-pre.1
Also working on updating the rebasing docs:
- https://github.com/gman0/kcp/tree/update-rebase-docs
https://github.com/gman0/kubernetes/commit/3ecc3a49cd2572a1bc9404d5d1aed30e8bf25fc6
TL;DR we may need to wire in a context to the asPartialObjectMetadataList function. Will try it out and we can discuss it once the PR is out.
https://github.com/gman0/kubernetes/commit/0f81a9ee821009644411367a519df3571011e580
The new changes in upstream's GC controller may cause some issues for us. I'll try to explain the issue on garbagecollector.Sync because we don't have the new version of ResyncMonitors just yet, and the hack/kcp/garbage_collector_patch.go generator just copy-pastes that anyway.
- New: https://github.com/kubernetes/kubernetes/blob/v1.32.0/pkg/controller/garbagecollector/garbagecollector.go#L175-L246
- Old: https://github.com/kubernetes/kubernetes/blob/v1.31.0/pkg/controller/garbagecollector/garbagecollector.go#L175-L283
Main changes:
- The new version no longer has locks for the workers (
(GarbageCollector struct).workerLockdoesn't exist anymore). It seemsgarbagecollector.Syncnow relies on retrying and hoping things will be in sync eventually. - Observe that the new version no longer retries
gc.resyncMonitors && cache.WaitForNamedCacheSyncinsidewait.PollImmediateUntilWithContext. Again, relying on global retries inside the parentwait.UntilWithContextloop.
Consequences:
- If we (kcp) fail in
gc.resyncMonitorsorcache.WaitForNamedCacheSync, it seems like we won't be able to retry, only until the next LogicalCluster event in GC controller that would invokeResyncMonitors. - Lack of worker locks means we may miss to garbage-collect some resources and would find out only until the next LogicalCluster event.
Now what?
- Why are we not reconciling like upstream does, with a periodic re-sync? Is it too expensive? If not, shouldn't we also implement some time-based resync (as a replacement or addition to event-based?)
- If the answer to the bullet point above is No, one alternative would be to try to revert upstream's https://github.com/kubernetes/kubernetes/commit/e8b1d7dc24713db99808028e0d02bacf6d48e01f just to keep the old behavior until we figure out a better way to handle polling (or some other way to trigger GC).
- However, the commit above fixes important bugs, and we are likely interested in to including it too:
- https://github.com/kubernetes/kubernetes/issues/101078
- https://github.com/kubernetes/kubernetes/issues/127105
What we've decided:
- We're going with the lock-full version, https://github.com/gman0/kubernetes/commit/28ac217b93a7b49caa5e7382c5dd01919834df7b.
- This is to protect GC from missing events.
- TODO: let's work toward lock-less version
https://github.com/gman0/kcp-code-generator/commit/e65054c73f6315664236041ef63e850fd217a808
The build constraint !ignore_autogenerated in templates caused the generated files to be not picked up by tools in https://github.com/kubernetes/code-generator/tree/master/cmd, specifically when re-generating deepcopy functions as a part of hack/update-codegen.sh.
Removing these constraints fixed the issue.
TODO:
- It's not yet clear why with this worked, with the constraints, when publishing our k8s rebase to v1.31.0. Generating the same v1.31.0 code now fails however (could this be some external dependency that was updated?).
- The build constraint seems to have been introduced already as a part of the very first PR: https://github.com/kcp-dev/code-generator/pull/1/files#diff-8f8bddc9a9b92d14d0cbdd05deaace70831ec9de0879b4b5b1bd99bc5f41a81eR21-R22 . I still need to check more thoroughly why it was there in the first place.
https://github.com/gman0/kcp/commit/ddbe24723a5ad38771f700da7d73334dfdf166d5
This commit removes v1alpha1 ValidatingAdmissionPolicy from informers and APIExport builtin types.
It seems it was added in 1f2244a0594660c98bbf5f4c0dfeaafe2315e0eb , c136c4ab01e6fd3570806b55cfd0c88e7707173e and 666c6a3f7f90c6d26dee0412889591c445ce56d3 . This was back in 2023 when we were basing kcp on k8s 1.26, and at that time VAP was indeed just being introduced, and so it was served from admissionregistration.k8s.io/v1alpha1. Since v1alpha1 APIs are disabled by default, c136c4ab01e6fd3570806b55cfd0c88e7707173e added a runtime config override:
https://github.com/kcp-dev/kcp/blob/4bd4122483ff418a58315389b4f872dcaec0fe7c/pkg/server/options/options.go#L133-L136
With k8s v1.32 rebase however v1alpha1 marks VAP as removed in its APILifecycle policy:
https://github.com/kubernetes/kubernetes/blob/v1.32.0/staging/src/k8s.io/api/admissionregistration/v1alpha1/zz_generated.prerelease-lifecycle.go#L108-L112
This makes it so that the kind is not served. Because of that it was necessary to remove the v1alpha1 VAP code for our 1.32 rebase, otherwise it was causing issues with our informers:
I0214 00:47:29.166169 1915080 deleted_kinds.go:169] Removing resource [validatingadmissionpolicybindings.v1alpha1.admissionregistration.k8s.io](http://validatingadmissionpolicybindings.v1alpha1.admissionregistration.k8s.io/) because it is time to stop serving it per APILifecycle.
Hi folks, reviving this issue with the latest updates:
- [DIFF-ONLY] Rebase to 1.32.3 kcp-dev/apimachinery#84
- [DIFF-ONLY] Rebase to 1.32.3 kcp-dev/code-generator#105
- [DIFF-ONLY] Rebase to 1.32.3 kcp-dev/client-go#41
- [DIFF-ONLY] Rebase to 1.32.3 kcp-dev/kubernetes#166
- kcp: https://github.com/kcp-dev/kcp/pull/3383
There were only minor changes compared to https://github.com/kcp-dev/kubernetes/pull/161