cass-operator
cass-operator copied to clipboard
K8SSAND-1180 ⁃ How do we gracefully increase storage capacity via cass-operator while Cass Datacenter, Statefulset etc are in service with incoming workloads
Following the below thread, wanted to get an update: https://community.datastax.com/questions/12269/index.html
Environment:
- AWS EKS and AWS EBS
- Cass-Operator : 1.9
- Server Image : DSE 6.8.18 and/or OSS 3.11.11
┆Issue is synchronized with this Jira Task by Unito ┆friendlyId: K8SSAND-1180 ┆priority: Medium
Hi, does your PV provider support PVC volume expansion?
Hi, does your PV provider support PVC volume expansion?
@burmanm - Yes, the storage class that we are using has the following property.
allowVolumeExpansion: true
@burmanm - Following up to see if there's an update on this?
Hey, sorry. The process of expanding a PVC with StatefulSets is a bit tricky and involves manual operations (restriction of Kubernetes). Sadly my local instance did not support the feature, but I'll try to create an example shortly with documented steps.
thnx @burmanm .
Is this something on the roadmap of cass-operator project?
It's a feature we would like to see, but unfortunately has not been scheduled yet. We have identified the steps to resolve the issue, but it will require a bit of time to implement.
Hey, sorry. The process of expanding a PVC with StatefulSets is a bit tricky and involves manual operations (restriction of Kubernetes). Sadly my local instance did not support the feature, but I'll try to create an example shortly with documented steps.
@burmanm Could you provide more details about this? I have a 4 node cluster and there disk usage is almost full. A workaround is to add nodes in cluster, and the data will rebalanced, and cleanup auto. But it is a waste of cpu and memory resources.
@counter2015 you can easily upgrade your storage manually:
- set new storage capacity in your PVC
- restart the cassandra pods one by one
Then, your PVCs should automatically get resized by your storage csi.
@discostur I am not sure if the PVC capactity will be changed by operator after I edit datacenter yaml file. Finally, I incresed storage capacity by creating a new datacenter and migrating data from old dc1 to new dc2.
@counter2015 no it does not! i edited my datacenter yaml file and nothing was changed in the pvc / pv. So i edited the pvc manually and the storage was resized ...
The process is actually a bit more involved to do it safely.
First, we need to delete the StatefulSet without deleting the pods. This can be done for example with kubectl delete --cascade=false
.
Next, make sure that persistentVolumeReclaimPolicy
on the PV is set to Retain
. Remove the claim reference. Then delete the PVC.
Now go ahead expand the volume and update the capacity in the PV spec.
Create new PVC that will bind to the PV. The name of the PVC needs to be the same as the name of the old one.
Lastly, recreate the StatefulSet. The StatefulSet controller find the existing PVCs and pods. The StatefulSet will immediately move into the ready state (assuming the pods are ready).
@jsanda Is there any risk to edit the pvc size directly ?
That may work and might be easier than what I prescribed. I would need to do some testing/investigation to be certain.
prometheus-operator (which uses statetulset for prometheus pod as well) offeris this way. But this does not work for k8scassandra because of admission webhook:
admission webhook "vcassandradatacenter.kb.io" denied the request: CassandraDatacenter write rejected, attempted to change storageConfig
My example:
k8ssandracluster:
apiVersion: k8ssandra.io/v1alpha1
kind: K8ssandraCluster
metadata:
name: demo
spec:
cassandra:
serverVersion: "4.0.3"
serverImage: k8ssandra/cass-management-api:4.0.3
telemetry:
prometheus:
enabled: true
storageConfig:
cassandraDataVolumeClaimSpec:
storageClassName: gp3-multizone
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 50Gi
config:
jvmOptions:
heapSize: 512M
datacenters:
- metadata:
name: dc1
size: 9
racks:
- name: r1
nodeAffinityLabels:
onairent.live/node-type: cassandra-node
topology.kubernetes.io/zone: eu-north-1a
- name: r2
nodeAffinityLabels:
onairent.live/node-type: cassandra-node
topology.kubernetes.io/zone: eu-north-1b
- name: r3
nodeAffinityLabels:
onairent.live/node-type: cassandra-node
topology.kubernetes.io/zone: eu-north-1c
- Change
storage
to 150Gi - Apply changed manifest
- Patch PVCs
for p in $(kubectl get pvc -l cassandra.datastax.com/datacenter=dc1 -o jsonpath='{range .items[*]}{.metadata.name} {end}'); do \
kubectl patch pvc/${p} --patch '{"spec": {"resources": {"requests": {"storage":"150Gi"}}}}'; \
done
- Delete statefulsets
kubectl delete statefulset -l cassandra.datastax.com/datacenter=dc1 --cascade=orphan
After that no changes are applied to cassandra cluster due to the error mentioned above. Even if I try to resize my cluster I get the error and nothing happens.
having the same issue described in previous comment what is the procedure of increase storage capacity in this case?
I have directly edited the PVCs and restarted the pods in my test environment. Well, nothing is broken and I can see the new size in the PVCs reflected and access the test data.
Check prometheus-operator resizing manual. Works fine for k8ssandra as well.
@okgolove I was trying the steps provided on prometheus-operator resizing manual and it worked for me but when i deleted the cluster and tried on a new cluster. It throws the error you mentioned above. Is it still working for you?
Error from server (CassandraDatacenter write rejected, attempted to change storageConfig.CassandraDataVolumeClaimSpec): admission webhook "vcassandradatacenter.kb.io" denied the request: CassandraDatacenter write rejected, attempted to change storageConfig.CassandraDataVolumeClaimSpec
@chandapukiran have you changed storage size in cluster manifest before recreating?
@okgolove No, so basically i have created a cluster with a default size and later tried to change the size by trying to modify the cass object
@chandapukiran ahh, yes. Admission webhok won't let you make this change. I disabled it temporary then modified.
@okgolove oh ok, could you share me the commands to disable/enable admission webhook
@chandapukiran how did you install the operator? If via helm chart then just set
cass-operator:
admissionWebhooks:
enabled: false
Or just delete admission webhook via kubectl
Thanks @okgolove , i see it is already disabled in my helm chart but I now understand why it worked for me before but not now. I was playing with k8ssandra-operator in another namespace and that was causing the issue. Now I am good.
Adding the exact steps to be followed for quick reference:
- disable
admissionWebhooks
in operator and re-deploy it -cass-operator: admissionWebhooks: enabled: false
- stop the required data-centers and set new value for volume size in
K8ssandraCluster
and apply the changes. Setstopped: true
flag in each of the required data-centers in thedatacenters
list and apply the yaml file usingkubectl apply -f <file>
. - manually edit the PVC to the required size for each node in the cluster. One can use
kubectl edit pvc <pvc-name> -n <namespace>
and edit the size in thespec
section - delete the underlying StatefulSet using the
orphan
deletion strategy:kubectl delete statefulset <sts-name> -n <namespace> --cascade=orphan
- remove the
stopped
flag in k8ssandra-cluster yaml file and apply the changes to re-start the stopped data-centers in the cluster - re-enable
admissionWebhooks
in operator and re-deploy it
Implementation ticket: #602