eks-anywhere
eks-anywhere copied to clipboard
Enable Volume Expansion for vSphere CSI
What would you like to be added: csi-resizer sidecar container for vsphere-csi-controller pod, to allow for volume expansion.
According to vsphere-csi-driver documentation:
An external resizer sidecar container implements the logic of watching the Kubernetes API for PVC edits, issuing the ControllerExpandVolume RPC call against a CSI endpoint, and updating the PersistentVolume object to reflect the new size. This container is already deployed as part of the vsphere-csi-controller pod.
It is however not present on any of our EKS-A clusters:
kubectl get pods -l app=vsphere-csi-controller -n kube-system -o jsonpath='{.items[*].spec.containers[*].name}'
csi-attacher vsphere-csi-controller liveness-probe vsphere-syncer csi-provisioner
- Could
csi-resizerbe added to the deployment managed by EKS-A by default? - Are there plans on allowing customization/management of
coredns,vsphere-csi-driver,kube-proxydown the road?
Environment:
- EKS Anywhere Release: 0.10.0
- EKS Distro Release: v1.21.13-eks-3f70b89e
If that can help anyone else, here is what I had to do to perform an offline volume expansion:
- add
csi-resizercontainer tovpshere-csi-controllerpod:
kubectl edit deploy vsphere-csi-controller -n kube-system
spec:
template:
spec:
containers:
- args:
- --v=4
- --timeout=300s
- --csi-address=$(ADDRESS)
- --leader-election
env:
- name: ADDRESS
value: /csi/csi.sock
image: quay.io/k8scsi/csi-resizer:v1.1.0
imagePullPolicy: IfNotPresent
name: csi-resizer
volumeMounts:
- mountPath: /csi
name: socket-dir
- Confirmed the
csi-resizersidecar was successfully added:
kubectl get pods -l app=vsphere-csi-controller -n kube-system -o jsonpath='{.items[*].spec.containers[*].name}'
csi-resizer csi-attacher vsphere-csi-controller liveness-probe vsphere-syncer csi-provisioner
- But the controller complained about a lack of permissions:
controller.go:272] Error syncing PVC: marking pvc "monitoring/storage-loki-stack-0" as resizing failed: can't patch status of PVC monitoring/storage-loki-stack-0 with persistentvolumeclaims "storage-loki-stack-0" is forbidden: User "system:serviceaccount:kube-system:vsphere-csi-controller" cannot patch resource "persistentvolumeclaims/status" in API group "" in the namespace "monitoring"
clusterrole/vsphere-csi-controller-rolewas updated:
kubectl edit clusterrole vsphere-csi-controller-role
rules:
- apiGroups:
- ""
resources:
- persistentvolumeclaims/status
verbs:
- patch
- Which resulted in:
I0810 17:12:03.012081 1 controller.go:281] Started PVC processing "monitoring/storage-loki-stack-0"
I0810 17:12:03.020503 1 event.go:282] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"monitoring", Name:"storage-loki-stack-0", UID:"38bf19f1-eb9f-437c-a0a9-ca7a7bb80fad", APIVersion:"v1", ResourceVersion:"1582368", FieldPath:""}): type: 'Normal' reason: 'Resizing' External resizer is resizing volume pvc-38bf19f1-eb9f-437c-a0a9-ca7a7bb80fad
E0810 17:12:03.134808 1 controller.go:272] Error syncing PVC: resize volume "pvc-38bf19f1-eb9f-437c-a0a9-ca7a7bb80fad" by resizer "csi.vsphere.vmware.com" failed: rpc error: code = FailedPrecondition desc = failed to expand volume: "d2c7201f-f662-48bd-9b8e-550d5fd942ec". Volume is attached to node. Online volume expansion is not supported in this version
I0810 17:12:03.134894 1 event.go:282] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"monitoring", Name:"storage-loki-stack-0", UID:"38bf19f1-eb9f-437c-a0a9-ca7a7bb80fad", APIVersion:"v1", ResourceVersion:"1582368", FieldPath:""}): type: 'Warning' reason: 'VolumeResizeFailed' resize volume "pvc-38bf19f1-eb9f-437c-a0a9-ca7a7bb80fad" by resizer "csi.vsphere.vmware.com" failed: rpc error: code = FailedPrecondition desc = failed to expand volume: "d2c7201f-f662-48bd-9b8e-550d5fd942ec". Volume is attached to node. Online volume expansion is not supported in this version
- So we scaled down the replicaset to 0, freeing up the volume and allowing
csi-resizerto perform an offline expansion.
kubectl scale statefulset loki-stack --replicas=0 -n monitoring
I0810 19:47:21.353529 1 event.go:282] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"monitoring", Name:"storage-loki-stack-0", UID:"38bf19f1-eb9f-437c-a0a9-ca7a7bb80fad", APIVersion:"v1", ResourceVersion:"1627997", FieldPath:""}): type: 'Normal' reason: 'Resizing' External resizer is resizing volume pvc-38bf19f1-eb9f-437c-a0a9-ca7a7bb80fad
I0810 19:47:22.219471 1 controller.go:447] Resize volume succeeded for volume "pvc-38bf19f1-eb9f-437c-a0a9-ca7a7bb80fad", start to update PV's capacity
I0810 19:47:22.219496 1 controller.go:537] Resize volume succeeded for volume "pvc-38bf19f1-eb9f-437c-a0a9-ca7a7bb80fad", start to update PV's capacity
I0810 19:47:22.231396 1 controller.go:453] Update capacity of PV "pvc-38bf19f1-eb9f-437c-a0a9-ca7a7bb80fad" to 210Gi succeeded
I0810 19:47:22.236804 1 controller.go:475] Mark PVC "monitoring/storage-loki-stack-0" as file system resize required
But vsphere-csi-driver documentation states online volume expansion was added in 2.2.0, and explicitly list csi-resizer:v1.1.0 as compatible with vpshere-csi-driver:v2.2.0 deployed by EKS-A.
- What am I missing?
- What is the reasoning behind staying on
vpshere-csi-driver:v2.2.0? Documentation suggests it does not officially support Kubernetes > 1.20.- Reiterating my question about a lever to manage
vsphere-csi-driverversion and configuration on our own, as we would ideally runvpshere-csi-driver:v2.5.xon our EKS-D 1.21 clusters.
- Reiterating my question about a lever to manage
@sleterrier Thanks for the report!
In terms of your first question about customizing coredns/proxy/csi, we do not currently have anything on the roadmap for this and will probably be fairly cautious here to avoid unexpected consequences due to changes to these critical components.
For the csi-driver specific questions.
According to the vsphere csi driver docs we could bump to the 2.4.x branch, which is still supported, and maintain 1.20 support. We do not currently support this, but have other cases where we end up supporting multiple component versions to support the various kubernetes versions. We could use the 2.4.x branch for 1.20 clusters, 2.5.x for 1.21 clusters and 2.6.x for 1.22+.
For the resizer sidecar, I dont see why we couldnt add this to the default deployment.
Another option may be to support disabling EKS-A's management for the csi driver altogether which would then allow you to setup your deployment however you want.
@jaxesn, thank you for your answer!
- Short of support for customizing
corednsandkube-proxyvia EKS-A's manifests, are there plans to go down the EKS add-ons route at some point? - Support disabling EKS-A's management for the csi driver sounds like a very reasonable approach. We would immediately take advantage of it.
- We have introduced Curated Packages to eks-a which includes things Harbor for a container registry, Metal LB for load balancing and Emissary for ingress. Since coredns and kube-proxy are core to the operation of the cluster we have decided not to expose them for now as Packages, not to say we couldn't or wouldn't in the future. What kinds of configuration are you thinking youd like to see there?
- I will work with our team here to see if maybe we could introduce this option as a stopgap at least. No ETA on timeline though I will update this issue as I know more.
- I admit my question around customization for coredns and kube-proxy was mainly driven by curiosity. But because you asked:
- Running specific versions of either, which might include a needed fix or a feature. Very much in the vein of what EKS add-ons provide.
- Updating kube-proxy listen address or proxy mode.
- Bonus, while we are talking about core components: enable Cilium Hubble and Hubble UI.
- Thank you for getting eyes on a vsphere-csi-driver toggle, much appreciated.
I find that the volume expansion for vSphere CSI to be rather a critical feature that should be enabled by default, specially when dealing with PostgreSQL databases that grow to big and need their PVC resized, please try and fast track getting volume expansion working, thank you.
Any news on this? IMHO this is a critical functionality needed for production environments. Thank you!
Another vote here.. This is essential functionality needed for production clusters. Thank you.
Installation of the vSphere CSI was deprecated in v0.16. It will no longer be installed or managed in existing clusters by EKSA starting in v0.17.