test: add feature gate consistency test for kubernetes feature - compatibililty versions
What type of PR is this?
/kind cleanup
What this PR does / why we need it:
This PR adds an integration test for testing feature gate compatibility for the Kubernetes Compatibility Versions KEP-4330 feature. This test parses the feature gate information from a test kube-apiserver w/ --emulation-version=<prior-version> and hits the /metrics endpoint. It compares the values it gets from this to the known correct values for that version. This test is needed as Compatibility Versions guarantees that APIs and feature gate values are correctly emulated properly when using --emulated-version=n-1..3. This test tests that core functionality of the feature.
NOTE: Currently this test fails "correctly" stating that AllowServiceLBStatusOnNonLB and ComponentSLIs are expected but missing. This test effectively caught those two feature gates being removed when they should not have been removed as they were usable in v1.31 and Compatibility Versions should allow them to be usable in 1.32, 1.33, and 1.34. This is documented more thoroughly in the issue here - https://github.com/kubernetes/kubernetes/issues/128036. As such, this PR is blocked on reverting the removal PRs for those feature gates- https://github.com/kubernetes/kubernetes/pull/126786 and https://github.com/kubernetes/kubernetes/pull/126698. Once those are reverted, this test will pass properly
Which issue(s) this PR fixes:
Fixes #127947
Special notes for your reviewer:
Does this PR introduce a user-facing change?
NONE
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:
- [KEP]: https://github.com/kubernetes/enhancements/blob/master/keps/sig-architecture/4330-compatibility-versions/README.md
/cc @Jefftree
/sig api-machinery
/triage accepted
[APPROVALNOTIFIER] This PR is NOT APPROVED
This pull-request has been approved by: aaron-prindle Once this PR has been reviewed and has the lgtm label, please assign mikedanese for approval. For more information see the Kubernetes Code Review Process.
The full list of commands accepted by this bot can be found here.
Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment
@aaron-prindle as you're looking at this code, I wonder if you'd be interested in helping produce a release artefact that lists the feature gates in a release and whether they are enabled / disabled by default.
If we had this, it would help k/website code verify that the machine readable feature gate metadata are correct - see https://github.com/kubernetes/website/tree/main/content/en/docs/reference/command-line-tools-reference/feature-gates.
@aaron-prindle as you're looking at this code, I wonder if you'd be interested in helping produce a release artefact that lists the feature gates in a release and whether they are enabled / disabled by default.
If we had this, it would help k/website code verify that the machine readable feature gate metadata are correct - see https://github.com/kubernetes/website/tree/main/content/en/docs/reference/command-line-tools-reference/feature-gates.
@sftim - yes, I would be interested in doing this. I've created this issue - https://github.com/kubernetes/kubernetes/issues/128241 to track this workstream. Feel free to add any context/ideas in the issue there, thanks!
Rebasing now that https://github.com/kubernetes/kubernetes/pull/128139 and https://github.com/kubernetes/kubernetes/pull/128317 have merged (related comment: https://github.com/kubernetes/kubernetes/issues/128036#issuecomment-2436522308)
@jefftree - test is now failing with :
feature DRAControlPlaneController not found in /metrics output
Is this feature gate being removed ok w.r.t. compatibility versions? I believe this PR is related:
https://github.com/kubernetes/kubernetes/pull/128003
I can modify the test to have some allowlist if necessary
@Jefftree - test is now failing with :
feature DRAControlPlaneController not found in /metrics outputIs this feature gate being removed ok w.r.t. compatibility versions? I believe this PR is related:
#128003
I can modify the test to have some allowlist if necessary
Yes, alpha features are permitted to be removed anytime. https://kubernetes.io/docs/reference/using-api/deprecation-policy/#deprecation
LGTM for integration test
/cc @BenTheElder - would you be able to review the test case generation tooling in the PR here?
- hack/generate-integration-compatibility-versions-test.sh
- hack/tools/parse-feature-gates/parse-feature-gates.go
Sorry I missed the previous pings!
I think this is a reasonable starting point, agree with previous review comments. cc also @aojea for visiblity.
/retest
/retest
@aaron-prindle: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:
| Test name | Commit | Details | Required | Rerun command |
|---|---|---|---|---|
| pull-kubernetes-e2e-gce-cos-alpha-features | f22a6ab774c6891698a02bbbb6db1a86cdf862a1 | link | false | /test pull-kubernetes-e2e-gce-cos-alpha-features |
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.
Takled offline w/ @aojea, there is consensus that this test is likely better as a prow periodic job than an integration test as this way the test can be fully automated vs require manual steps each release -> increased project friction/toil. Going to close this for now and submit a similar test to test-infra that is fully automated
Takled offline w/ @aojea, there is consensus that this test is likely better as a prow periodic job than an integration test as this way the test can be fully automated vs require manual steps each release -> increased project friction/toil. Going to close this for now and submit a similar test to test-infra that is fully automated
sounds good. Not having manual steps each release is definitely a big plus.
code in tree can not depend on manual steps or external factors to progress
code in tree can not depend on manual steps or external factors to progress
Yeah that's fair, but I do think we can write a version of this that just requires being on a cluster with the flag set, and that seems to be more or less true in the latest version of the shell script.
It might not be appropriate as one of the existing integration tests though, maybe a new test binary or something