enhancements icon indicating copy to clipboard operation
enhancements copied to clipboard

Retroactive default StorageClass assignement

Open jsafrane opened this issue 2 years ago • 17 comments

Enhancement Description

  • One-line enhancement description (can be used as a release note): Retroactive default StorageClass assignement.

  • Kubernetes Enhancement Proposal:

  • Discussion Link: https://docs.google.com/document/d/1k1r_h-CW-Oqsh3PX-Qb5jpjyt1taKLGbUhHvSpvVOYo/edit#

  • Primary contact (assignee): @RomanBednar

  • Responsible SIGs: sig-storage

  • Enhancement target (which target equals to which milestone):

    • Alpha release target (x.y): 1.25
    • Beta release target (x.y): 1.26
    • Stable release target (x.y): 1.28?
  • [ ] Alpha

    • [x] KEP (k/enhancements) update PR(s): https://github.com/kubernetes/enhancements/pull/3337
    • [x] Code (k/k) update PR(s): https://github.com/kubernetes/kubernetes/pull/111467
    • [ ] Docs (k/website) update PR(s): https://github.com/kubernetes/website/pull/35658

jsafrane avatar Jun 02 '22 15:06 jsafrane

/sig storage

jsafrane avatar Jun 02 '22 15:06 jsafrane

/milestone v1.25

jsafrane avatar Jun 02 '22 15:06 jsafrane

/kind feature /stage alpha

jsafrane avatar Jun 02 '22 16:06 jsafrane

Hello @jsafrane, @RomanBednar 👋, 1.25 Enhancements team here.

Just checking in as we approach enhancements freeze on 18:00 PST on Thursday June 16, 2022.

For note, This enhancement is targeting for stage alpha for 1.25 (correct me, if otherwise)

Here's where this enhancement currently stands:

  • [x] KEP file using the latest template has been merged into the k/enhancements repo.
  • [x] KEP status is marked as implementable
  • [x] KEP has a updated detailed test plan section filled out
  • [x] KEP has up to date graduation criteria
  • [x] KEP has a production readiness review that has been completed and merged into k/enhancements.

Looks like for this one, we would need to update the following:

  • Update and get open PR https://github.com/kubernetes/enhancements/pull/3337 merged
    • Update the status field from provisional to implementable in the kep.yaml file and also reflect the latest target stage and milestone information

For note, the status of this enhancement is marked as at risk. Please keep the issue description up-to-date with appropriate stages as well. Thank you!

jasonbraganza avatar Jun 06 '22 13:06 jasonbraganza

Hello @jsafrane, @RomanBednar 👋, just a quick check-in again, as we approach the 1.25 enhancements freeze.

Please plan to get the lone item above that is pending, done before enhancements freeze on Thursday, June 16, 2022 at 18:00 PM PT.

For note, the current status of the enhancement is atat-risk. Thank you!

jasonbraganza avatar Jun 13 '22 12:06 jasonbraganza

With KEP PR https://github.com/kubernetes/enhancements/pull/3337 merged, the enhancement is ready for the upcoming enhancements freeze.

For note, the status is now marked as tracked in the enhancements tracking sheet. Thank you!

Priyankasaggu11929 avatar Jun 17 '22 17:06 Priyankasaggu11929

Hello @jsafrane 👋, 1.25 Release Docs Shadow here.

This enhancement is marked as ‘Needs Docs’ for the 1.25 release. Please follow the steps detailed in the documentation to open a PR against the dev-1.25 branch in the k/website repo. This PR can be just a placeholder at this time and must be created by August 4.


Also, take a look at Documenting for a release to familiarize yourself with the docs requirement for the release. Thank you!

didicodes avatar Jul 13 '22 13:07 didicodes

Hi @jsafrane, Enhancements team here again 👋

Checking in as we approach Code Freeze at 01:00 UTC on Wednesday, 3rd August 2022.

Please ensure that the following items are completed before the code-freeze:

  • [x] All PRs to the Kubernetes repo that are related to your enhancement are linked in the above issue description (for tracking purposes).
  • [x] All PRs are fully merged by the code freeze deadline.

Currently, the status of the enhancement is marked as at-risk

Thanks :)

Atharva-Shinde avatar Jul 25 '22 16:07 Atharva-Shinde

Hey @jsafrane, reaching out again as we approach Code Freeze at 01:00 UTC on this Wednesday i.e 3rd August 2022. Try to get all the action items which are mentioned in the comment above done before the code-freeze :) The status of the enhancement is still marked as at-risk

Atharva-Shinde avatar Aug 01 '22 15:08 Atharva-Shinde

I linked this PR: https://github.com/kubernetes/kubernetes/pull/111467

jsafrane avatar Aug 01 '22 18:08 jsafrane

Thanks @jsafrane. With k/k PR https://github.com/kubernetes/kubernetes/pull/111467 merged now, the enhancement is marked as tracked.

Priyankasaggu11929 avatar Aug 03 '22 00:08 Priyankasaggu11929

/milestone v1.26

xing-yang avatar Sep 07 '22 19:09 xing-yang

Performance was tested with kind on single node with 10 pods and 10 volumes per pod (=100 PVCs). Dynamic provisioning was enabled during the test. Cluster has been modified to use Immediate volume binding mode.

Results show that enabling the feature does not have negative impact on performance.

Test cluster modifications:

1) Custom PVC template with selected-node label (required for immediate volume binding):

$ cat ./testing/experimental/storage/pod-startup/volume-types/persistentvolume/pvc-custom.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: {{.Name}}
  labels:
    app: {{.Name}}
    group: {{.Group}}
{{ if .Provisioner }}
  annotations:
    volume.beta.kubernetes.io/storage-provisioner: {{.Provisioner}}
    volume.kubernetes.io/selected-node: kind-control-plane
{{ end }}
spec:
  accessModes:
    - ReadWriteOnce
  {{ if .StorageClass }}
  storageClassName: {{.StorageClass}}
  {{ end }}
  resources:
    requests:
      storage: {{.VolSize}}

2) Storage class with Immediate volume binding mode:

apiVersion: storage.k8s.io/v1 
kind: StorageClass 
metadata: 
 annotations: 
   storageclass.kubernetes.io/is-default-class: "true" 
 name: standard 
provisioner: rancher.io/local-path 
reclaimPolicy: Delete 
volumeBindingMode: Immediate

These are the test overrides I used:

PODS_PER_NODE: 10
VOLUMES_PER_POD: 10
NODES_PER_NAMESPACE: 1
START_PODS: false
VOL_SIZE: 128Mi
STORAGE_CLASS: standard
PROVISIONER: rancher.io/local-path
GATHER_METRICS: false
WAIT_FOR_PVS_DELETED: false
STEP_TIME_SECONDS: 500
# The custom PVC template mentioned above:
VOLUME_TEMPLATE_PATH: "volume-types/persistentvolume/pvc-custom.yaml"

Results:

1) Cluster with RetroactiveDefaultStorageClass feature disabled

$ kubectl -n kube-system get pod/kube-controller-manager-kind-no-feature-control-plane -o json | jq '.spec.containers[0].command[12]'
"--feature-gates=KubeletInUserNamespace=true"

$ kubectl -n kube-system get pod/kube-apiserver-kind-no-feature-control-plane -o json | jq '.spec.containers[0].command[11]'
"--feature-gates=KubeletInUserNamespace=true"

$ go run cmd/clusterloader.go -v=3 --report-dir=/tmp/clusterloader2-prov-no-feature --kubeconfig=/tmp/kubeconfig-no-feature --provider=kind --nodes=1 --testconfig=testing/experimental/storage/pod-startup/config.yaml --mastername=kind-no-feature-control-plane --master-internal-ip=10.89.0.3 --testoverrides=testing/experimental/storage/pod-startup/volume_binding/override.yaml --testoverrides=testing/experimental/storage/pod-startup/volume-types/persistentvolume/override.yaml --testoverrides=provision-override.yaml

Test result (junit):

<?xml version="1.0" encoding="UTF-8"?>
  <testsuite name="ClusterLoaderV2" tests="0" failures="0" errors="0" time="290.548">
      <testcase name="storage overall (testing/experimental/storage/pod-startup/config.yaml)" classname="ClusterLoaderV2" time="290.542663876"></testcase>
      <testcase name="storage: [step: 01] Provisioning volumes" classname="ClusterLoaderV2" time="10.064247582"></testcase>
      <testcase name="storage: [step: 02] Waiting for PVs to be bound [00] - WaitForPVCsToBeBound" classname="ClusterLoaderV2" time="220.084696883"></testcase>
      <testcase name="storage: [step: 03] Deleting volumes" classname="ClusterLoaderV2" time="10.113884574"></testcase>
  </testsuite>

2) cluster with RetroactiveDefaultStorageClass feature enabled

$ kubectl -n kube-system get pod/kube-controller-manager-kind-control-plane -o json | jq '.spec.containers[0].command[12]'
"--feature-gates=KubeletInUserNamespace=true,RetroactiveDefaultStorageClass=true"

$ kubectl -n kube-system get pod/kube-apiserver-kind-control-plane -o json | jq '.spec.containers[0].command[11]'
"--feature-gates=KubeletInUserNamespace=true,RetroactiveDefaultStorageClass=true"

$ go run cmd/clusterloader.go -v=3 --report-dir=/tmp/clusterloader2-prov --kubeconfig=/tmp/kubeconfig --provider=kind --nodes=1 --testconfig=testing/experimental/storage/pod-startup/config.yaml --mastername=kind-control-plane --master-internal-ip=10.89.0.2 --testoverrides=testing/experimental/storage/pod-startup/volume_binding/override.yaml --testoverrides=testing/experimental/storage/pod-startup/volume-types/persistentvolume/override.yaml --testoverrides=provision-override.yaml

Test result (junit):

<?xml version="1.0" encoding="UTF-8"?>
  <testsuite name="ClusterLoaderV2" tests="0" failures="0" errors="0" time="285.547">
      <testcase name="storage overall (testing/experimental/storage/pod-startup/config.yaml)" classname="ClusterLoaderV2" time="285.542948724"></testcase>
      <testcase name="storage: [step: 01] Provisioning volumes" classname="ClusterLoaderV2" time="10.082875582"></testcase>
      <testcase name="storage: [step: 02] Waiting for PVs to be bound [00] - WaitForPVCsToBeBound" classname="ClusterLoaderV2" time="220.08929458"></testcase>
      <testcase name="storage: [step: 03] Deleting volumes" classname="ClusterLoaderV2" time="10.097375571"></testcase>
  </testsuite>

RomanBednar avatar Sep 19 '22 12:09 RomanBednar

Hey @jsafrane 👋, 1.26 Enhancements team here!

Just checking in as we approach Enhancements Freeze on 18:00 PDT on Thursday 6th October 2022.

This enhancement is targeting for stage beta for 1.26 (correct me, if otherwise)

Here's where this enhancement currently stands:

  • [X] KEP file using the latest template has been merged into the k/enhancements repo.
  • [X] KEP status is marked as implementable
  • [X] KEP has an updated detailed test plan section filled out
  • [X] KEP has up to date graduation criteria
  • [ ] KEP has a production readiness review that has been completed and merged into k/enhancements.

For this KEP, we would need to:

  • Update the kep.yaml to reflect the current milestone information
  • Update the production readiness review with latest stage information
  • Include the new updated PR of this KEP in the Issue Description and get it merged before Enhancements Freeze to make this enhancement eligible for 1.26 release.

The status of this enhancement is marked as at risk. Please keep the issue description up-to-date with appropriate stages as well. Thank you :)

Atharva-Shinde avatar Sep 21 '22 16:09 Atharva-Shinde

/stage beta

rhockenbury avatar Sep 21 '22 19:09 rhockenbury

Hello @jsafrane 👋, just a quick check-in again, as we approach the 1.26 Enhancements freeze.

Please plan to get the action items mentioned in my comment above done before Enhancements freeze on 18:00 PDT on Thursday 6th October 2022 i.e tomorrow

For note, the current status of the enhancement is marked at-risk :)

Atharva-Shinde avatar Oct 05 '22 16:10 Atharva-Shinde

With #3544 merged, we have this down as tracked for v1.26. Thanks!

rhockenbury avatar Oct 07 '22 01:10 rhockenbury

Rollout-rollback-rollout testing was performed, feature behaves as expected and no issues were observed. Posting my tests results:

Perform pre-upgrade tests

Set default storage class:

$ kc patch sc/csi-hostpath-sc -p '{"metadata":{"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'
storageclass.storage.k8s.io/csi-hostpath-sc patched

PVC does not get updated and remains pending:

$ kc get pvc
NAME      STATUS    VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS   AGE
csi-pvc   Pending                              

Upgrade cluster

Check available 1.25 versions:

$ yum search kubeadm --showduplicates --quiet | grep 1.25
kubeadm-1.25.0-0.x86_64 : Command-line utility for administering a Kubernetes cluster.
kubeadm-1.25.1-0.x86_64 : Command-line utility for administering a Kubernetes cluster.
kubeadm-1.25.2-0.x86_64 : Command-line utility for administering a Kubernetes cluster.

Install/update kubeadm:

$ sudo yum install -y kubeadm-1.25.2-0

Prepare kubeadm config file that enables FeatureGate:

$ cat /mnt/clusterconf-examples/featuregate.yaml 
## Example kubeadm configuration for enabling a feature gate.
---
apiVersion: kubeadm.k8s.io/v1beta3
kind: ClusterConfiguration
apiServer:
  extraArgs:
    feature-gates: RetroactiveDefaultStorageClass=true
controllerManager:
  extraArgs:
    cluster-cidr: 10.244.0.0/16
    feature-gates: RetroactiveDefaultStorageClass=true

Perform kubeadm upgrade:

$ sudo kubeadm upgrade plan --config /mnt/clusterconf-examples/featuregate.yaml
$ sudo kubeadm upgrade apply --config /mnt/clusterconf-examples/featuregate.yaml v1.25.2

Perform kubelet upgrade:

$ sudo yum install -y kubelet-1.25.2-0
$ sudo systemctl daemon-reload 
$ sudo systemctl restart kubelet

Perform post-upgrade tests

Verify that PVC got SC assigned right after upgrade and PV was provisioned and bound:

$ kc get pvc
NAME      STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS      AGE
csi-pvc   Bound    pvc-06a964ca-f997-4780-8627-b5c3bf5a87d8   1Gi        RWO            csi-hostpath-sc   87m

Downgrade cluster

$ yum history | grep -E "kubeadm|kubelet"
    10 | install -y kubelet-1.25.2-0                                                                                                                                                                                                                                       | 2022-10-12 11:06 | Upgrade        |    1   
     8 | install -y kubeadm-1.25.2-0                                                                                                                                                                                                                                       | 2022-10-12 09:45 | Upgrade        |    1   
     7 | install -y kubelet-1.24.5-0 kubeadm-1.24.5-0 kubectl

$ sudo yum -y history undo 8 && sudo yum -y history undo 10

Perform post-rollback tests

Remove default SC:

$ kc patch sc/csi-hostpath-sc -p '{"metadata":{"annotations":{"storageclass.kubernetes.io/is-default-class":"false"}}}'
storageclass.storage.k8s.io/csi-hostpath-sc patched

Create new PVC without SC:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: csi-pvc-2
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi
$ kc get pvc
NAME        STATUS    VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS      AGE
csi-pvc     Bound     pvc-06a964ca-f997-4780-8627-b5c3bf5a87d8   1Gi        RWO            csi-hostpath-sc   96m
csi-pvc-2   Pending     

Add default SC again:

$ kc patch sc/csi-hostpath-sc -p '{"metadata":{"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'
storageclass.storage.k8s.io/csi-hostpath-sc patched

Verify that the new PVC did not get updated with SC this time:

$ kc get pvc/csi-pvc-2
NAME        STATUS    VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS   AGE
csi-pvc-2   Pending    

Upgrade cluster again

Install/update kubeadm:

$ sudo yum install -y kubeadm-1.25.2-0

Perform kubeadm upgrade:

$ sudo kubeadm upgrade plan --config /mnt/clusterconf-examples/featuregate.yaml
$ sudo kubeadm upgrade apply --config /mnt/clusterconf-examples/featuregate.yaml v1.25.2

Perform kubelet upgrade:

$ sudo yum install -y kubelet-1.25.2-0
$ sudo systemctl daemon-reload 
$ sudo systemctl restart kubelet

Perform post-upgrade tests again

Verify that PVC got SC assigned right after upgrade and PV was provisioned and bound:

$ kc get pvc
NAME        STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS      AGE
csi-pvc     Bound    pvc-06a964ca-f997-4780-8627-b5c3bf5a87d8   1Gi        RWO            csi-hostpath-sc   117m
csi-pvc-2   Bound    pvc-2e765394-f32c-42fb-b3db-ffe203612bac   1Gi        RWO            csi-hostpath-sc   24m

RomanBednar avatar Oct 18 '22 13:10 RomanBednar

Tested version skew and sharing results below. First case is with feature enabled in API server and disabled in KCM and the second case is the other way around.

Case 1 - API on / KCM off

API server KCM Behavior
on off Existing Kubernetes behavior, only users can change pvc.spec.storageClassName=nil to a SC name.

Controller will not update PVC:

$ kc get pvc
NAME      STATUS    VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS   AGE
myclaim   Pending                                                     4s

$ kc patch sc/standard -p '{"metadata":{"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'
storageclass.storage.k8s.io/standard patched

$ kc get pvc
NAME      STATUS    VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS   AGE
myclaim   Pending  

But API server will allow change from nil value:

$ kc get pvc/myclaim -o json | jq '.spec.storageClassName'
null

$ kc patch pvc/myclaim -p '{"spec":{"storageClassName":"standard"}}'
persistentvolumeclaim/myclaim patched (no change)

$ kc get pvc/myclaim -o json | jq '.spec.storageClassName'
"standard"

$ kc patch pvc/myclaim -p '{"spec":{"storageClassName":"test"}}'
The PersistentVolumeClaim "myclaim" is invalid: spec: Forbidden: spec is immutable after creation except resources.requests for bound claim

Case 2 - API off / KCM on

API server KCM Behavior
off on PV controller may try to change pvc.spec.storageClassName=nil to a new default SC name, which will fail on the API server.

Attempt retroactive SC update:

$ kc get pvc
NAME      STATUS    VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS   AGE
myclaim   Pending                                                     4s

$ kc patch sc/standard -p '{"metadata":{"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'
storageclass.storage.k8s.io/standard patched

$ kc get pvc
NAME      STATUS    VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS   AGE
myclaim   Pending                                                     13s

Check KCM to verify PV controller attempted the update but it was rejected by API server as expected:

I1019 08:42:12.056353  579577 storageclass.go:50] GetDefaultClass added: standard
I1019 08:42:12.056372  579577 pv_controller.go:953] assigning StorageClass[standard] to PersistentVolumeClaim[default/myclaim]
E1019 08:42:12.079545  579577 pv_controller_base.go:275] could not sync volume "default/myclaim": can't update PersistentVolumeClaim["default/myclaim"]: PersistentVolumeClaim "myclaim" is invalid: spec: Forbidden: spec is immutable after creation except resources.requests for bound claims
  core.PersistentVolumeClaimSpec{
  	... // 2 identical fields
  	Resources:        {Requests: {s"storage": {i: {...}, Format: "BinarySI"}}},
  	VolumeName:       "",
- 	StorageClassName: nil,
+ 	StorageClassName: &"standard",
  	VolumeMode:       &"Filesystem",
  	DataSource:       nil,
  	DataSourceRef:    nil,
  }

RomanBednar avatar Oct 20 '22 15:10 RomanBednar

Hi @RomanBednar 👋,

Checking in as we approach 1.26 code freeze at 17:00 PDT on Tuesday 8th November 2022.

Please ensure the following items are completed:

  • [ ] All PRs to the Kubernetes repo that are related to your enhancement are linked in the above issue description (for tracking purposes).
  • [ ] All PRs are fully merged by the code freeze deadline.

For this enhancement, it looks like the following PRs are open and need to be merged before code freeze:

  • (https://github.com/kubernetes/kubernetes/pull/113329)

Let me know if there's any other PRs that we should be tracking.

As always, we are here to help should questions come up. Thanks!

rhockenbury avatar Oct 29 '22 18:10 rhockenbury

With https://github.com/kubernetes/kubernetes/pull/113329 merged, I have this marked as tracked for code freeze.

rhockenbury avatar Nov 09 '22 00:11 rhockenbury

Hello @jsafrane 👋, 1.26 Release Docs Lead here. This enhancement is marked as ‘Needs Docs’ for 1.26 release.

Please follow the steps detailed in the documentation to open a PR against dev-1.26 branch in the k/website repo. This PR can be just a placeholder at this time, and must be created by November 9. Also, take a look at Documenting for a release to familiarize yourself with the docs requirement for the release.

Any doubt, reach us! Thank you!

krol3 avatar Nov 09 '22 12:11 krol3

Hello @RomanBednar and @jsafrane 👋 please take a look at Documenting for a release - PR Ready for Review to get your PR ready for review before deadline Tuesday 15th November 2022. Thank you!

krol3 avatar Nov 14 '22 15:11 krol3

Hi @RomanBednar and @jsafrane, about Documentation for 1.26, this KEP needs to be updated to state that Retroactive default StorageClass assignment is beta in 1.26 cc: @reylejano @Rishit-dagli

krol3 avatar Nov 16 '22 17:11 krol3

Hi @RomanBednar , @jsafrane , @xing-yang Thank you for your doc PR here

cc: @Rishit-dagli @reylejano

krol3 avatar Nov 17 '22 18:11 krol3

/remove-label lead-opted-in /remove-label tracked/yes /label tracked/no /milestone clear

marosset avatar Dec 14 '22 19:12 marosset

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Mar 14 '23 19:03 k8s-triage-robot

/remove-lifecycle stale

RomanBednar avatar Mar 15 '23 12:03 RomanBednar

/milestone v1.28 /assign @RomanBednar /label lead-opted-in

jsafrane avatar May 16 '23 08:05 jsafrane

/unassign

jsafrane avatar May 16 '23 08:05 jsafrane