eks-anywhere icon indicating copy to clipboard operation
eks-anywhere copied to clipboard

Enable Volume Expansion for vSphere CSI

Open sleterrier opened this issue 3 years ago • 5 comments

What would you like to be added: csi-resizer sidecar container for vsphere-csi-controller pod, to allow for volume expansion.

According to vsphere-csi-driver documentation:

An external resizer sidecar container implements the logic of watching the Kubernetes API for PVC edits, issuing the ControllerExpandVolume RPC call against a CSI endpoint, and updating the PersistentVolume object to reflect the new size. This container is already deployed as part of the vsphere-csi-controller pod.

It is however not present on any of our EKS-A clusters:

kubectl get pods -l app=vsphere-csi-controller -n kube-system -o jsonpath='{.items[*].spec.containers[*].name}'
csi-attacher vsphere-csi-controller liveness-probe vsphere-syncer csi-provisioner
  1. Could csi-resizer be added to the deployment managed by EKS-A by default?
  2. Are there plans on allowing customization/management of coredns, vsphere-csi-driver, kube-proxy down the road?

Environment:

  • EKS Anywhere Release: 0.10.0
  • EKS Distro Release: v1.21.13-eks-3f70b89e

sleterrier avatar Aug 10 '22 01:08 sleterrier

If that can help anyone else, here is what I had to do to perform an offline volume expansion:

  • add csi-resizer container to vpshere-csi-controller pod:
kubectl edit deploy vsphere-csi-controller -n kube-system
spec:
  template:
    spec:
      containers:
        - args:
          - --v=4
          - --timeout=300s
          - --csi-address=$(ADDRESS)
          - --leader-election
          env:
          - name: ADDRESS
            value: /csi/csi.sock
          image: quay.io/k8scsi/csi-resizer:v1.1.0
          imagePullPolicy: IfNotPresent
          name: csi-resizer
          volumeMounts:
          - mountPath: /csi
            name: socket-dir
  • Confirmed the csi-resizer sidecar was successfully added:
kubectl get pods -l app=vsphere-csi-controller -n kube-system -o jsonpath='{.items[*].spec.containers[*].name}'
csi-resizer csi-attacher vsphere-csi-controller liveness-probe vsphere-syncer csi-provisioner
  • But the controller complained about a lack of permissions:
controller.go:272] Error syncing PVC: marking pvc "monitoring/storage-loki-stack-0" as resizing failed: can't patch status of  PVC monitoring/storage-loki-stack-0 with persistentvolumeclaims "storage-loki-stack-0" is forbidden: User "system:serviceaccount:kube-system:vsphere-csi-controller" cannot patch resource "persistentvolumeclaims/status" in API group "" in the namespace "monitoring"
  • clusterrole/vsphere-csi-controller-role was updated:
kubectl edit clusterrole vsphere-csi-controller-role
rules:
- apiGroups:
  - ""
  resources:
  - persistentvolumeclaims/status
  verbs:
  - patch
  • Which resulted in:
I0810 17:12:03.012081       1 controller.go:281] Started PVC processing "monitoring/storage-loki-stack-0"
I0810 17:12:03.020503       1 event.go:282] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"monitoring", Name:"storage-loki-stack-0", UID:"38bf19f1-eb9f-437c-a0a9-ca7a7bb80fad", APIVersion:"v1", ResourceVersion:"1582368", FieldPath:""}): type: 'Normal' reason: 'Resizing' External resizer is resizing volume pvc-38bf19f1-eb9f-437c-a0a9-ca7a7bb80fad
E0810 17:12:03.134808       1 controller.go:272] Error syncing PVC: resize volume "pvc-38bf19f1-eb9f-437c-a0a9-ca7a7bb80fad" by resizer "csi.vsphere.vmware.com" failed: rpc error: code = FailedPrecondition desc = failed to expand volume: "d2c7201f-f662-48bd-9b8e-550d5fd942ec". Volume is attached to node. Online volume expansion is not supported in this version
I0810 17:12:03.134894       1 event.go:282] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"monitoring", Name:"storage-loki-stack-0", UID:"38bf19f1-eb9f-437c-a0a9-ca7a7bb80fad", APIVersion:"v1", ResourceVersion:"1582368", FieldPath:""}): type: 'Warning' reason: 'VolumeResizeFailed' resize volume "pvc-38bf19f1-eb9f-437c-a0a9-ca7a7bb80fad" by resizer "csi.vsphere.vmware.com" failed: rpc error: code = FailedPrecondition desc = failed to expand volume: "d2c7201f-f662-48bd-9b8e-550d5fd942ec". Volume is attached to node. Online volume expansion is not supported in this version
  • So we scaled down the replicaset to 0, freeing up the volume and allowing csi-resizer to perform an offline expansion.
kubectl scale statefulset loki-stack --replicas=0 -n monitoring
I0810 19:47:21.353529       1 event.go:282] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"monitoring", Name:"storage-loki-stack-0", UID:"38bf19f1-eb9f-437c-a0a9-ca7a7bb80fad", APIVersion:"v1", ResourceVersion:"1627997", FieldPath:""}): type: 'Normal' reason: 'Resizing' External resizer is resizing volume pvc-38bf19f1-eb9f-437c-a0a9-ca7a7bb80fad
I0810 19:47:22.219471       1 controller.go:447] Resize volume succeeded for volume "pvc-38bf19f1-eb9f-437c-a0a9-ca7a7bb80fad", start to update PV's capacity
I0810 19:47:22.219496       1 controller.go:537] Resize volume succeeded for volume "pvc-38bf19f1-eb9f-437c-a0a9-ca7a7bb80fad", start to update PV's capacity
I0810 19:47:22.231396       1 controller.go:453] Update capacity of PV "pvc-38bf19f1-eb9f-437c-a0a9-ca7a7bb80fad" to 210Gi succeeded
I0810 19:47:22.236804       1 controller.go:475] Mark PVC "monitoring/storage-loki-stack-0" as file system resize required

But vsphere-csi-driver documentation states online volume expansion was added in 2.2.0, and explicitly list csi-resizer:v1.1.0 as compatible with vpshere-csi-driver:v2.2.0 deployed by EKS-A.

  • What am I missing?
  • What is the reasoning behind staying on vpshere-csi-driver:v2.2.0? Documentation suggests it does not officially support Kubernetes > 1.20.
    • Reiterating my question about a lever to manage vsphere-csi-driver version and configuration on our own, as we would ideally run vpshere-csi-driver:v2.5.x on our EKS-D 1.21 clusters.

sleterrier avatar Aug 10 '22 17:08 sleterrier

@sleterrier Thanks for the report!

In terms of your first question about customizing coredns/proxy/csi, we do not currently have anything on the roadmap for this and will probably be fairly cautious here to avoid unexpected consequences due to changes to these critical components.

For the csi-driver specific questions.

According to the vsphere csi driver docs we could bump to the 2.4.x branch, which is still supported, and maintain 1.20 support. We do not currently support this, but have other cases where we end up supporting multiple component versions to support the various kubernetes versions. We could use the 2.4.x branch for 1.20 clusters, 2.5.x for 1.21 clusters and 2.6.x for 1.22+.

For the resizer sidecar, I dont see why we couldnt add this to the default deployment.

Another option may be to support disabling EKS-A's management for the csi driver altogether which would then allow you to setup your deployment however you want.

jaxesn avatar Aug 19 '22 15:08 jaxesn

@jaxesn, thank you for your answer!

  • Short of support for customizing coredns and kube-proxy via EKS-A's manifests, are there plans to go down the EKS add-ons route at some point?
  • Support disabling EKS-A's management for the csi driver sounds like a very reasonable approach. We would immediately take advantage of it.

sleterrier avatar Aug 19 '22 20:08 sleterrier

  • We have introduced Curated Packages to eks-a which includes things Harbor for a container registry, Metal LB for load balancing and Emissary for ingress. Since coredns and kube-proxy are core to the operation of the cluster we have decided not to expose them for now as Packages, not to say we couldn't or wouldn't in the future. What kinds of configuration are you thinking youd like to see there?
  • I will work with our team here to see if maybe we could introduce this option as a stopgap at least. No ETA on timeline though I will update this issue as I know more.

jaxesn avatar Aug 22 '22 21:08 jaxesn

  • I admit my question around customization for coredns and kube-proxy was mainly driven by curiosity. But because you asked:
    • Running specific versions of either, which might include a needed fix or a feature. Very much in the vein of what EKS add-ons provide.
    • Updating kube-proxy listen address or proxy mode.
    • Bonus, while we are talking about core components: enable Cilium Hubble and Hubble UI.
  • Thank you for getting eyes on a vsphere-csi-driver toggle, much appreciated.

sleterrier avatar Aug 24 '22 21:08 sleterrier

I find that the volume expansion for vSphere CSI to be rather a critical feature that should be enabled by default, specially when dealing with PostgreSQL databases that grow to big and need their PVC resized, please try and fast track getting volume expansion working, thank you.

echel0n avatar Oct 07 '22 01:10 echel0n

Any news on this? IMHO this is a critical functionality needed for production environments. Thank you!

romancin avatar Jan 17 '23 11:01 romancin

Another vote here.. This is essential functionality needed for production clusters. Thank you.

jplewes avatar Jan 21 '23 21:01 jplewes

Installation of the vSphere CSI was deprecated in v0.16. It will no longer be installed or managed in existing clusters by EKSA starting in v0.17.

chrisdoherty4 avatar Jul 10 '23 16:07 chrisdoherty4