cstor-operators
cstor-operators copied to clipboard
cStor using the removed APIs in k8s 1.25 requires changes
Problem Description
When creating application with cStor provisioned volume(3 replicas), app gets stuck in container creating state.
Environment details: Kubeadm based 4-node(1 master & 3 workers) cluster with K8s 1.25 version:
[root@k8s-master-640 ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8s-master-640 Ready control-plane 20h v1.25.0
k8s-node1-641 Ready <none> 19h v1.25.0
k8s-node2-642 Ready <none> 19h v1.25.0
k8s-node3-643 Ready <none> 19h v1.25.0
Each node is having 3 disks attached to it.
Steps followed to create a cStor volume:
- Created a CSPC using the 3 disks on all the 3 worker nodes.
- CSPC created successfully with the provisioned == desired instances(CSPI) and the pool pods are also in running state.
- Created a cStor volume with
3 replicas
mentioned in the StorageClass. - PVC gets bounds to its respective PV.
- CVR are created and all are in
healthy
state - Deployed an application with the above created PVC.
Describe of the application pod:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 7m54s default-scheduler 0/4 nodes are available: 4 pod has unbound immediate PersistentVolumeClaims. preemption: 0/4 nodes are available: 4 Preemption is not helpful for scheduling.
Normal Scheduled 7m52s default-scheduler Successfully assigned default/wordpress-5fb7bff8dd-csqrb to k8s-node1-641
Warning FailedMount 2m3s (x10 over 7m43s) kubelet MountVolume.MountDevice failed for volume "pvc-14297415-5f2a-406f-bf8b-87a1a5006742" : rpc error: code = Internal desc = Waiting for pvc-14297415-5f2a-406f-bf8b-87a1a5006742's CVC to be bound
Warning FailedMount 77s (x3 over 5m50s) kubelet Unable to attach or mount volumes: unmounted volumes=[wordpress-persistent-storage], unattached volumes=[wordpress-persistent-storage kube-api-access-zwkx9]: timed out waiting for the condition'
Describe of CVC:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Provisioning 8m22s (x4 over 8m40s) cstorvolumeclaim-controller failed to create PDB for volume: pvc-14297415-5f2a-406f-bf8b-87a1a5006742: failed to list PDB belongs to pools with selector openebs.io/cstor-disk-pool-ffvp=true,openebs.io/cstor-disk-pool-l2fb=true,openebs.io/cstor-disk-pool-54zn=true: the server could not find the requested resource
Warning Provisioning 4m47s (x4 over 8m36s) cstorvolumeclaim-controller failed to create PDB for volume: pvc-14297415-5f2a-406f-bf8b-87a1a5006742: failed to list PDB belongs to pools with selector openebs.io/cstor-disk-pool-l2fb=true,openebs.io/cstor-disk-pool-54zn=true,openebs.io/cstor-disk-pool-ffvp=true: the server could not find the requested resource
Warning Provisioning 3m17s (x18 over 8m42s) cstorvolumeclaim-controller failed to create PDB for volume: pvc-14297415-5f2a-406f-bf8b-87a1a5006742: failed to list PDB belongs to pools with selector openebs.io/cstor-disk-pool-54zn=true,openebs.io/cstor-disk-pool-ffvp=true,openebs.io/cstor-disk-pool-l2fb=true: the server could not find the requested resource
Logs from one of the pool pods:
I0907 06:52:21.373440 8 event.go:282] Event(v1.ObjectReference{Kind:"CStorVolumeReplica", Namespace:"openebs", Name:"pvc-5a9e63ce-1c6d-4c53-bb7f-dd4782360736-cstor-disk-pool-54zn", UID:"7f1d146f-4c2c-4a91-a3b0-9b0500867ce1", APIVersion:"cstor.openebs.io/v1", ResourceVersion:"138978", FieldPath:""}): type: 'Normal' reason: 'Synced' Received Resource create event
I0907 06:52:21.389429 8 handler.go:226] will process add event for cvr {pvc-5a9e63ce-1c6d-4c53-bb7f-dd4782360736-cstor-disk-pool-54zn} as volume {cstor-fb027a66-716a-4abf-b643-3a336cc3da6a/pvc-5a9e63ce-1c6d-4c53-bb7f-dd4782360736}
I0907 06:52:21.393542 8 handler.go:572] cVR 'pvc-5a9e63ce-1c6d-4c53-bb7f-dd4782360736-cstor-disk-pool-54zn': uid '7f1d146f-4c2c-4a91-a3b0-9b0500867ce1': phase 'Init': is_empty_status: false
I0907 06:52:21.393557 8 handler.go:584] cVR pending: 7f1d146f-4c2c-4a91-a3b0-9b0500867ce1
2022-09-07T06:52:21.527Z INFO volumereplica/volumereplica.go:308 {"eventcode": "cstor.volume.replica.create.success", "msg": "Successfully created CStor volume replica", "rname": "cstor-fb027a66-716a-4abf-b643-3a336cc3da6a/pvc-5a9e63ce-1c6d-4c53-bb7f-dd4782360736"}
I0907 06:52:21.527245 8 handler.go:469] cVR creation successful: pvc-5a9e63ce-1c6d-4c53-bb7f-dd4782360736-cstor-disk-pool-54zn, 7f1d146f-4c2c-4a91-a3b0-9b0500867ce1
I0907 06:52:21.527559 8 event.go:282] Event(v1.ObjectReference{Kind:"CStorVolumeReplica", Namespace:"openebs", Name:"pvc-5a9e63ce-1c6d-4c53-bb7f-dd4782360736-cstor-disk-pool-54zn", UID:"7f1d146f-4c2c-4a91-a3b0-9b0500867ce1", APIVersion:"cstor.openebs.io/v1", ResourceVersion:"138980", FieldPath:""}): type: 'Normal' reason: 'Created' Resource created successfully
I0907 06:52:21.538547 8 event.go:282] Event(v1.ObjectReference{Kind:"CStorVolumeReplica", Namespace:"openebs", Name:"pvc-5a9e63ce-1c6d-4c53-bb7f-dd4782360736-cstor-disk-pool-54zn", UID:"7f1d146f-4c2c-4a91-a3b0-9b0500867ce1", APIVersion:"cstor.openebs.io/v1", ResourceVersion:"138980", FieldPath:""}): type: 'Warning' reason: 'SyncFailed' failed to sync CVR error: unable to update snapshot list details in CVR: failed to get the list of snapshots: Output: failed listsnap command for cstor-fb027a66-716a-4abf-b643-3a336cc3da6a/pvc-5a9e63ce-1c6d-4c53-bb7f-dd4782360736 with err 11
Error: exit status 11
I0907 06:52:21.563031 8 event.go:282] Event(v1.ObjectReference{Kind:"CStorVolumeReplica", Namespace:"openebs", Name:"pvc-5a9e63ce-1c6d-4c53-bb7f-dd4782360736-cstor-disk-pool-54zn", UID:"7f1d146f-4c2c-4a91-a3b0-9b0500867ce1", APIVersion:"cstor.openebs.io/v1", ResourceVersion:"139013", FieldPath:""}): type: 'Warning' reason: 'SyncFailed' failed to sync CVR error: unable to update snapshot list details in CVR: failed to get the list of snapshots: Output: failed listsnap command for cstor-fb027a66-716a-4abf-b643-3a336cc3da6a/pvc-5a9e63ce-1c6d-4c53-bb7f-dd4782360736 with err 11
Error: exit status 11
How to solve
Upon debugging, found out that cStor operators is using v1beta1
version of PodDisruptBudegt
object in its codebase which was deprecated in K8s 1.21 version and is completely removed in K8s 1.25 version.
We need upgrade the usage version of PodDisruptBudegt
to v1 in the codebase to enable cStor to work in K8s 1.25 or later versions
In case someone find this issue: The fix is already merged and will be release with 3.4.0 (?) https://github.com/openebs/cstor-operators/pull/436
I test it. It was supported in v3.4.x
i installed the helm from https://openebs.github.io/cstor-operators/
helm install openebs-cstor openebs-cstor/cstor -n openebs --create-namespace
it comes with version 3.3.0
when i create as below; ... apiVersion: cstor.openebs.io/v1 kind: CStorPoolCluster metadata: name: cstor-storage namespace: openebs spec: pools: - nodeSelector: kubernetes.io/hostname: "nlgkube1" dataRaidGroups: - blockDevices: - blockDeviceName: "blockdevice-555b0dea91b5518752dcb2a682243507" poolConfig: dataRaidGroupType: "stripe"
- nodeSelector:
kubernetes.io/hostname: "nlgkube2"
dataRaidGroups:
- blockDevices:
- blockDeviceName: "blockdevice-3f880e51eded0a6aa5a30196c90662cf"
poolConfig:
dataRaidGroupType: "stripe"
- nodeSelector:
kubernetes.io/hostname: "nlgkube3"
dataRaidGroups:
- blockDevices:
- blockDeviceName: "blockdevice-3591f0cc0dee841d261f57b47135ff35"
poolConfig:
dataRaidGroupType: "stripe"
kind: StorageClass apiVersion: storage.k8s.io/v1 metadata: name: cstor-csi-disk provisioner: cstor.csi.openebs.io allowVolumeExpansion: true parameters: cas-type: cstor
cstorPoolCluster should have the name of the CSPC
cstorPoolCluster: cstor-disk-pool
replicaCount should be <= no. of CSPI created in the selected CSPC
replicaCount: "3" ... ... kubectl get cvc -A NAMESPACE NAME CAPACITY STATUS AGE openebs pvc-a45c7e9b-bfd5-490a-8dd4-ba6462321a59 Pending 22m ... describe shows: ... Type Reason Age From Message
Warning Provisioning 22m cstorvolumeclaim-controller services "pvc-a45c7e9b-bfd5-490a-8dd4-ba6462321a59" already exists Warning Provisioning 2m34s (x55 over 22m) cstorvolumeclaim-controller not enough pools are available of provided CSPC: "cstor-disk-pool", usable pool count: 0 pending replica count: 3 Warning Provisioning 2m19s (x57 over 22m) cstorvolumeclaim-controller not enough pools are available of provided CSPC: "cstor-disk-pool", usable pool count: 0 pending replica count: 3 ...
the pvc shows OK. ... kubectl get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE demo-cstor-vol Bound pvc-a45c7e9b-bfd5-490a-8dd4-ba6462321a59 5Gi RWO cstor-csi-disk 23m ... and the cspc ... kubectl get cspc -A NAMESPACE NAME HEALTHYINSTANCES PROVISIONEDINSTANCES DESIREDINSTANCES AGE openebs cstor-storage 3 3 3 24m ... ... kubectl get cspi -A NAMESPACE NAME HOSTNAME FREE CAPACITY READONLY PROVISIONEDREPLICAS HEALTHYREPLICAS STATUS AGE openebs cstor-storage-b5n7 nlgkube3 96400M 96400086k false 0 0 ONLINE 24m openebs cstor-storage-bk6t nlgkube2 96400M 96400086k false 0 0 ONLINE 24m openebs cstor-storage-mx7g nlgkube1 96400M 96400086k false 0 0 ONLINE 24m ...
Everything looks OK what could be the issue ! Thanks for any help
I can confirm that the issue was resolved by using 3.4.0 - 3.3.0 failed and the minute i upgraded the deployments and all reference to the 3.3.0 image to be 3.4.0 - everything worked.
kubectl get cspi -o wide
NAME HOSTNAME ALLOCATED FREE CAPACITY READONLY PROVISIONEDREPLICAS HEALTHYREPLICAS TYPE STATUS AGE
cstor-storage-drvm nlgkube2 230k 96400M 96400230k false 0 0 stripe ONLINE 2m22s
cstor-storage-kztz nlgkube1 614k 96400M 96400614k false 0 0 stripe ONLINE 2m22s
cstor-storage-lhgb nlgkube3 230k 96400M 96400230k false 0 0 stripe ONLINE 2m21s
It seems there is no replicas provisioned ... wondering why ?
I had a typo ! In the storage class not linking to the disk pool duhhhh
waitting v3.4.0 helm charts, but anyway deploy 3.4.0 now?
No need to wait, https://github.com/openebs/velero-plugin/issues/183
Read that through and see last comments for cstor-operator that works :)
From: will @.> Sent: Thursday, November 17, 2022 3:56:08 AM To: openebs/cstor-operators @.> Cc: Jad Seifeddine @.>; Comment @.> Subject: Re: [openebs/cstor-operators] cStor using the removed APIs in k8s 1.25 requires changes (Issue #435)
waitting v3.4.0 helm charts
— Reply to this email directly, view it on GitHubhttps://github.com/openebs/cstor-operators/issues/435#issuecomment-1317346917, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AZR4BBMKYPJSPRVTARGNEDDWIUG2RANCNFSM6AAAAAAQIMSKOU. You are receiving this because you commented.Message ID: @.***>
https://github.com/openebs/velero-plugin/issues/183#issuecomment-1317675988
any plan to release helm chart 3.4.0?