enhancements
enhancements copied to clipboard
Retroactive default StorageClass assignement
Enhancement Description
-
One-line enhancement description (can be used as a release note): Retroactive default StorageClass assignement.
-
Kubernetes Enhancement Proposal:
-
Discussion Link: https://docs.google.com/document/d/1k1r_h-CW-Oqsh3PX-Qb5jpjyt1taKLGbUhHvSpvVOYo/edit#
-
Primary contact (assignee): @RomanBednar
-
Responsible SIGs: sig-storage
-
Enhancement target (which target equals to which milestone):
- Alpha release target (x.y): 1.25
- Beta release target (x.y): 1.26
- Stable release target (x.y): 1.28?
-
[ ] Alpha
- [x] KEP (
k/enhancements
) update PR(s): https://github.com/kubernetes/enhancements/pull/3337 - [x] Code (
k/k
) update PR(s): https://github.com/kubernetes/kubernetes/pull/111467 - [ ] Docs (
k/website
) update PR(s): https://github.com/kubernetes/website/pull/35658
- [x] KEP (
/sig storage
/milestone v1.25
/kind feature /stage alpha
Hello @jsafrane, @RomanBednar 👋, 1.25 Enhancements team here.
Just checking in as we approach enhancements freeze on 18:00 PST on Thursday June 16, 2022.
For note, This enhancement is targeting for stage alpha
for 1.25 (correct me, if otherwise)
Here's where this enhancement currently stands:
- [x] KEP file using the latest template has been merged into the k/enhancements repo.
- [x] KEP status is marked as
implementable
- [x] KEP has a updated detailed test plan section filled out
- [x] KEP has up to date graduation criteria
- [x] KEP has a production readiness review that has been completed and merged into k/enhancements.
Looks like for this one, we would need to update the following:
- Update and get open PR https://github.com/kubernetes/enhancements/pull/3337 merged
- Update the
status
field fromprovisional
toimplementable
in thekep.yaml
file and also reflect the latest target stage and milestone information
- Update the
For note, the status of this enhancement is marked as at risk
. Please keep the issue description up-to-date with appropriate stages as well. Thank you!
Hello @jsafrane, @RomanBednar 👋, just a quick check-in again, as we approach the 1.25 enhancements freeze.
Please plan to get the lone item above that is pending, done before enhancements freeze on Thursday, June 16, 2022 at 18:00 PM PT.
For note, the current status of the enhancement is atat-risk
. Thank you!
With KEP PR https://github.com/kubernetes/enhancements/pull/3337 merged, the enhancement is ready for the upcoming enhancements freeze.
For note, the status is now marked as tracked
in the enhancements tracking sheet. Thank you!
Hello @jsafrane 👋, 1.25 Release Docs Shadow here.
This enhancement is marked as ‘Needs Docs’ for the 1.25 release. Please follow the steps detailed in the documentation to open a PR against the dev-1.25 branch in the k/website repo. This PR can be just a placeholder at this time and must be created by August 4.
Also, take a look at Documenting for a release to familiarize yourself with the docs requirement for the release. Thank you!
Hi @jsafrane, Enhancements team here again 👋
Checking in as we approach Code Freeze at 01:00 UTC on Wednesday, 3rd August 2022.
Please ensure that the following items are completed before the code-freeze:
- [x] All PRs to the Kubernetes repo that are related to your enhancement are linked in the above issue description (for tracking purposes).
- [x] All PRs are fully merged by the code freeze deadline.
Currently, the status of the enhancement is marked as at-risk
Thanks :)
Hey @jsafrane, reaching out again as we approach Code Freeze at 01:00 UTC on this Wednesday i.e 3rd August 2022.
Try to get all the action items which are mentioned in the comment above done before the code-freeze :)
The status of the enhancement is still marked as at-risk
I linked this PR: https://github.com/kubernetes/kubernetes/pull/111467
Thanks @jsafrane. With k/k PR https://github.com/kubernetes/kubernetes/pull/111467 merged now, the enhancement is marked as tracked
.
/milestone v1.26
Performance was tested with kind on single node with 10 pods and 10 volumes per pod (=100 PVCs). Dynamic provisioning was enabled during the test. Cluster has been modified to use Immediate
volume binding mode.
Results show that enabling the feature does not have negative impact on performance.
Test cluster modifications:
1) Custom PVC template with selected-node
label (required for immediate volume binding):
$ cat ./testing/experimental/storage/pod-startup/volume-types/persistentvolume/pvc-custom.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: {{.Name}}
labels:
app: {{.Name}}
group: {{.Group}}
{{ if .Provisioner }}
annotations:
volume.beta.kubernetes.io/storage-provisioner: {{.Provisioner}}
volume.kubernetes.io/selected-node: kind-control-plane
{{ end }}
spec:
accessModes:
- ReadWriteOnce
{{ if .StorageClass }}
storageClassName: {{.StorageClass}}
{{ end }}
resources:
requests:
storage: {{.VolSize}}
2) Storage class with Immediate
volume binding mode:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
annotations:
storageclass.kubernetes.io/is-default-class: "true"
name: standard
provisioner: rancher.io/local-path
reclaimPolicy: Delete
volumeBindingMode: Immediate
These are the test overrides I used:
PODS_PER_NODE: 10
VOLUMES_PER_POD: 10
NODES_PER_NAMESPACE: 1
START_PODS: false
VOL_SIZE: 128Mi
STORAGE_CLASS: standard
PROVISIONER: rancher.io/local-path
GATHER_METRICS: false
WAIT_FOR_PVS_DELETED: false
STEP_TIME_SECONDS: 500
# The custom PVC template mentioned above:
VOLUME_TEMPLATE_PATH: "volume-types/persistentvolume/pvc-custom.yaml"
Results:
1) Cluster with RetroactiveDefaultStorageClass
feature disabled
$ kubectl -n kube-system get pod/kube-controller-manager-kind-no-feature-control-plane -o json | jq '.spec.containers[0].command[12]'
"--feature-gates=KubeletInUserNamespace=true"
$ kubectl -n kube-system get pod/kube-apiserver-kind-no-feature-control-plane -o json | jq '.spec.containers[0].command[11]'
"--feature-gates=KubeletInUserNamespace=true"
$ go run cmd/clusterloader.go -v=3 --report-dir=/tmp/clusterloader2-prov-no-feature --kubeconfig=/tmp/kubeconfig-no-feature --provider=kind --nodes=1 --testconfig=testing/experimental/storage/pod-startup/config.yaml --mastername=kind-no-feature-control-plane --master-internal-ip=10.89.0.3 --testoverrides=testing/experimental/storage/pod-startup/volume_binding/override.yaml --testoverrides=testing/experimental/storage/pod-startup/volume-types/persistentvolume/override.yaml --testoverrides=provision-override.yaml
Test result (junit):
<?xml version="1.0" encoding="UTF-8"?>
<testsuite name="ClusterLoaderV2" tests="0" failures="0" errors="0" time="290.548">
<testcase name="storage overall (testing/experimental/storage/pod-startup/config.yaml)" classname="ClusterLoaderV2" time="290.542663876"></testcase>
<testcase name="storage: [step: 01] Provisioning volumes" classname="ClusterLoaderV2" time="10.064247582"></testcase>
<testcase name="storage: [step: 02] Waiting for PVs to be bound [00] - WaitForPVCsToBeBound" classname="ClusterLoaderV2" time="220.084696883"></testcase>
<testcase name="storage: [step: 03] Deleting volumes" classname="ClusterLoaderV2" time="10.113884574"></testcase>
</testsuite>
2) cluster with RetroactiveDefaultStorageClass
feature enabled
$ kubectl -n kube-system get pod/kube-controller-manager-kind-control-plane -o json | jq '.spec.containers[0].command[12]'
"--feature-gates=KubeletInUserNamespace=true,RetroactiveDefaultStorageClass=true"
$ kubectl -n kube-system get pod/kube-apiserver-kind-control-plane -o json | jq '.spec.containers[0].command[11]'
"--feature-gates=KubeletInUserNamespace=true,RetroactiveDefaultStorageClass=true"
$ go run cmd/clusterloader.go -v=3 --report-dir=/tmp/clusterloader2-prov --kubeconfig=/tmp/kubeconfig --provider=kind --nodes=1 --testconfig=testing/experimental/storage/pod-startup/config.yaml --mastername=kind-control-plane --master-internal-ip=10.89.0.2 --testoverrides=testing/experimental/storage/pod-startup/volume_binding/override.yaml --testoverrides=testing/experimental/storage/pod-startup/volume-types/persistentvolume/override.yaml --testoverrides=provision-override.yaml
Test result (junit):
<?xml version="1.0" encoding="UTF-8"?>
<testsuite name="ClusterLoaderV2" tests="0" failures="0" errors="0" time="285.547">
<testcase name="storage overall (testing/experimental/storage/pod-startup/config.yaml)" classname="ClusterLoaderV2" time="285.542948724"></testcase>
<testcase name="storage: [step: 01] Provisioning volumes" classname="ClusterLoaderV2" time="10.082875582"></testcase>
<testcase name="storage: [step: 02] Waiting for PVs to be bound [00] - WaitForPVCsToBeBound" classname="ClusterLoaderV2" time="220.08929458"></testcase>
<testcase name="storage: [step: 03] Deleting volumes" classname="ClusterLoaderV2" time="10.097375571"></testcase>
</testsuite>
Hey @jsafrane 👋, 1.26 Enhancements team here!
Just checking in as we approach Enhancements Freeze on 18:00 PDT on Thursday 6th October 2022.
This enhancement is targeting for stage beta
for 1.26 (correct me, if otherwise)
Here's where this enhancement currently stands:
- [X] KEP file using the latest template has been merged into the k/enhancements repo.
- [X] KEP status is marked as
implementable
- [X] KEP has an updated detailed test plan section filled out
- [X] KEP has up to date graduation criteria
- [ ] KEP has a production readiness review that has been completed and merged into k/enhancements.
For this KEP, we would need to:
- Update the kep.yaml to reflect the current milestone information
- Update the production readiness review with latest stage information
- Include the new updated PR of this KEP in the Issue Description and get it merged before Enhancements Freeze to make this enhancement eligible for 1.26 release.
The status of this enhancement is marked as at risk
. Please keep the issue description up-to-date with appropriate stages as well.
Thank you :)
/stage beta
Hello @jsafrane 👋, just a quick check-in again, as we approach the 1.26 Enhancements freeze.
Please plan to get the action items mentioned in my comment above done before Enhancements freeze on 18:00 PDT on Thursday 6th October 2022 i.e tomorrow
For note, the current status of the enhancement is marked at-risk
:)
With #3544 merged, we have this down as tracked
for v1.26. Thanks!
Rollout-rollback-rollout testing was performed, feature behaves as expected and no issues were observed. Posting my tests results:
Perform pre-upgrade tests
Set default storage class:
$ kc patch sc/csi-hostpath-sc -p '{"metadata":{"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'
storageclass.storage.k8s.io/csi-hostpath-sc patched
PVC does not get updated and remains pending:
$ kc get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
csi-pvc Pending
Upgrade cluster
Check available 1.25 versions:
$ yum search kubeadm --showduplicates --quiet | grep 1.25
kubeadm-1.25.0-0.x86_64 : Command-line utility for administering a Kubernetes cluster.
kubeadm-1.25.1-0.x86_64 : Command-line utility for administering a Kubernetes cluster.
kubeadm-1.25.2-0.x86_64 : Command-line utility for administering a Kubernetes cluster.
Install/update kubeadm:
$ sudo yum install -y kubeadm-1.25.2-0
Prepare kubeadm config file that enables FeatureGate:
$ cat /mnt/clusterconf-examples/featuregate.yaml
## Example kubeadm configuration for enabling a feature gate.
---
apiVersion: kubeadm.k8s.io/v1beta3
kind: ClusterConfiguration
apiServer:
extraArgs:
feature-gates: RetroactiveDefaultStorageClass=true
controllerManager:
extraArgs:
cluster-cidr: 10.244.0.0/16
feature-gates: RetroactiveDefaultStorageClass=true
Perform kubeadm upgrade:
$ sudo kubeadm upgrade plan --config /mnt/clusterconf-examples/featuregate.yaml
$ sudo kubeadm upgrade apply --config /mnt/clusterconf-examples/featuregate.yaml v1.25.2
Perform kubelet upgrade:
$ sudo yum install -y kubelet-1.25.2-0
$ sudo systemctl daemon-reload
$ sudo systemctl restart kubelet
Perform post-upgrade tests
Verify that PVC got SC assigned right after upgrade and PV was provisioned and bound:
$ kc get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
csi-pvc Bound pvc-06a964ca-f997-4780-8627-b5c3bf5a87d8 1Gi RWO csi-hostpath-sc 87m
Downgrade cluster
$ yum history | grep -E "kubeadm|kubelet"
10 | install -y kubelet-1.25.2-0 | 2022-10-12 11:06 | Upgrade | 1
8 | install -y kubeadm-1.25.2-0 | 2022-10-12 09:45 | Upgrade | 1
7 | install -y kubelet-1.24.5-0 kubeadm-1.24.5-0 kubectl
$ sudo yum -y history undo 8 && sudo yum -y history undo 10
Perform post-rollback tests
Remove default SC:
$ kc patch sc/csi-hostpath-sc -p '{"metadata":{"annotations":{"storageclass.kubernetes.io/is-default-class":"false"}}}'
storageclass.storage.k8s.io/csi-hostpath-sc patched
Create new PVC without SC:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: csi-pvc-2
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
$ kc get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
csi-pvc Bound pvc-06a964ca-f997-4780-8627-b5c3bf5a87d8 1Gi RWO csi-hostpath-sc 96m
csi-pvc-2 Pending
Add default SC again:
$ kc patch sc/csi-hostpath-sc -p '{"metadata":{"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'
storageclass.storage.k8s.io/csi-hostpath-sc patched
Verify that the new PVC did not get updated with SC this time:
$ kc get pvc/csi-pvc-2
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
csi-pvc-2 Pending
Upgrade cluster again
Install/update kubeadm:
$ sudo yum install -y kubeadm-1.25.2-0
Perform kubeadm upgrade:
$ sudo kubeadm upgrade plan --config /mnt/clusterconf-examples/featuregate.yaml
$ sudo kubeadm upgrade apply --config /mnt/clusterconf-examples/featuregate.yaml v1.25.2
Perform kubelet upgrade:
$ sudo yum install -y kubelet-1.25.2-0
$ sudo systemctl daemon-reload
$ sudo systemctl restart kubelet
Perform post-upgrade tests again
Verify that PVC got SC assigned right after upgrade and PV was provisioned and bound:
$ kc get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
csi-pvc Bound pvc-06a964ca-f997-4780-8627-b5c3bf5a87d8 1Gi RWO csi-hostpath-sc 117m
csi-pvc-2 Bound pvc-2e765394-f32c-42fb-b3db-ffe203612bac 1Gi RWO csi-hostpath-sc 24m
Tested version skew and sharing results below. First case is with feature enabled in API server and disabled in KCM and the second case is the other way around.
Case 1 - API on / KCM off
API server | KCM | Behavior |
---|---|---|
on | off | Existing Kubernetes behavior, only users can change pvc.spec.storageClassName=nil to a SC name. |
Controller will not update PVC:
$ kc get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
myclaim Pending 4s
$ kc patch sc/standard -p '{"metadata":{"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'
storageclass.storage.k8s.io/standard patched
$ kc get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
myclaim Pending
But API server will allow change from nil value:
$ kc get pvc/myclaim -o json | jq '.spec.storageClassName'
null
$ kc patch pvc/myclaim -p '{"spec":{"storageClassName":"standard"}}'
persistentvolumeclaim/myclaim patched (no change)
$ kc get pvc/myclaim -o json | jq '.spec.storageClassName'
"standard"
$ kc patch pvc/myclaim -p '{"spec":{"storageClassName":"test"}}'
The PersistentVolumeClaim "myclaim" is invalid: spec: Forbidden: spec is immutable after creation except resources.requests for bound claim
Case 2 - API off / KCM on
API server | KCM | Behavior |
---|---|---|
off | on | PV controller may try to change pvc.spec.storageClassName=nil to a new default SC name, which will fail on the API server. |
Attempt retroactive SC update:
$ kc get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
myclaim Pending 4s
$ kc patch sc/standard -p '{"metadata":{"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'
storageclass.storage.k8s.io/standard patched
$ kc get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
myclaim Pending 13s
Check KCM to verify PV controller attempted the update but it was rejected by API server as expected:
I1019 08:42:12.056353 579577 storageclass.go:50] GetDefaultClass added: standard
I1019 08:42:12.056372 579577 pv_controller.go:953] assigning StorageClass[standard] to PersistentVolumeClaim[default/myclaim]
E1019 08:42:12.079545 579577 pv_controller_base.go:275] could not sync volume "default/myclaim": can't update PersistentVolumeClaim["default/myclaim"]: PersistentVolumeClaim "myclaim" is invalid: spec: Forbidden: spec is immutable after creation except resources.requests for bound claims
core.PersistentVolumeClaimSpec{
... // 2 identical fields
Resources: {Requests: {s"storage": {i: {...}, Format: "BinarySI"}}},
VolumeName: "",
- StorageClassName: nil,
+ StorageClassName: &"standard",
VolumeMode: &"Filesystem",
DataSource: nil,
DataSourceRef: nil,
}
Hi @RomanBednar 👋,
Checking in as we approach 1.26 code freeze at 17:00 PDT on Tuesday 8th November 2022.
Please ensure the following items are completed:
- [ ] All PRs to the Kubernetes repo that are related to your enhancement are linked in the above issue description (for tracking purposes).
- [ ] All PRs are fully merged by the code freeze deadline.
For this enhancement, it looks like the following PRs are open and need to be merged before code freeze:
- (https://github.com/kubernetes/kubernetes/pull/113329)
Let me know if there's any other PRs that we should be tracking.
As always, we are here to help should questions come up. Thanks!
With https://github.com/kubernetes/kubernetes/pull/113329 merged, I have this marked as tracked
for code freeze.
Hello @jsafrane 👋, 1.26 Release Docs Lead here. This enhancement is marked as ‘Needs Docs’ for 1.26 release.
Please follow the steps detailed in the documentation to open a PR against dev-1.26
branch in the k/website
repo. This PR can be just a placeholder at this time, and must be created by November 9. Also, take a look at Documenting for a release to familiarize yourself with the docs requirement for the release.
Any doubt, reach us! Thank you!
Hello @RomanBednar and @jsafrane 👋 please take a look at Documenting for a release - PR Ready for Review to get your PR ready for review before deadline Tuesday 15th November 2022. Thank you!
Hi @RomanBednar and @jsafrane, about Documentation for 1.26, this KEP needs to be updated to state that Retroactive default StorageClass assignment is beta in 1.26 cc: @reylejano @Rishit-dagli
Hi @RomanBednar , @jsafrane , @xing-yang Thank you for your doc PR here
cc: @Rishit-dagli @reylejano
/remove-label lead-opted-in /remove-label tracked/yes /label tracked/no /milestone clear
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale
- Close this issue with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
/milestone v1.28 /assign @RomanBednar /label lead-opted-in
/unassign