cstor-operators icon indicating copy to clipboard operation
cstor-operators copied to clipboard

cStor using the removed APIs in k8s 1.25 requires changes

Open Ab-hishek opened this issue 2 years ago • 7 comments

Problem Description

When creating application with cStor provisioned volume(3 replicas), app gets stuck in container creating state.

Environment details: Kubeadm based 4-node(1 master & 3 workers) cluster with K8s 1.25 version:

[root@k8s-master-640 ~]# kubectl get nodes
NAME             STATUS   ROLES           AGE   VERSION
k8s-master-640   Ready    control-plane   20h   v1.25.0
k8s-node1-641    Ready    <none>          19h   v1.25.0
k8s-node2-642    Ready    <none>          19h   v1.25.0
k8s-node3-643    Ready    <none>          19h   v1.25.0

Each node is having 3 disks attached to it.

Steps followed to create a cStor volume:

  1. Created a CSPC using the 3 disks on all the 3 worker nodes.
  2. CSPC created successfully with the provisioned == desired instances(CSPI) and the pool pods are also in running state.
  3. Created a cStor volume with 3 replicas mentioned in the StorageClass.
  4. PVC gets bounds to its respective PV.
  5. CVR are created and all are in healthy state
  6. Deployed an application with the above created PVC.

Describe of the application pod:

Events:
  Type     Reason            Age                    From               Message
  ----     ------            ----                   ----               -------
  Warning  FailedScheduling  7m54s                  default-scheduler  0/4 nodes are available: 4 pod has unbound immediate PersistentVolumeClaims. preemption: 0/4 nodes are available: 4 Preemption is not helpful for scheduling.
  Normal   Scheduled         7m52s                  default-scheduler  Successfully assigned default/wordpress-5fb7bff8dd-csqrb to k8s-node1-641
  Warning  FailedMount       2m3s (x10 over 7m43s)  kubelet            MountVolume.MountDevice failed for volume "pvc-14297415-5f2a-406f-bf8b-87a1a5006742" : rpc error: code = Internal desc = Waiting for pvc-14297415-5f2a-406f-bf8b-87a1a5006742's CVC to be bound
  Warning  FailedMount       77s (x3 over 5m50s)    kubelet            Unable to attach or mount volumes: unmounted volumes=[wordpress-persistent-storage], unattached volumes=[wordpress-persistent-storage kube-api-access-zwkx9]: timed out waiting for the condition'

Describe of CVC:

Events:
  Type     Reason        Age                     From                         Message
  ----     ------        ----                    ----                         -------
  Warning  Provisioning  8m22s (x4 over 8m40s)   cstorvolumeclaim-controller  failed to create PDB for volume: pvc-14297415-5f2a-406f-bf8b-87a1a5006742: failed to list PDB belongs to pools with selector openebs.io/cstor-disk-pool-ffvp=true,openebs.io/cstor-disk-pool-l2fb=true,openebs.io/cstor-disk-pool-54zn=true: the server could not find the requested resource
  Warning  Provisioning  4m47s (x4 over 8m36s)   cstorvolumeclaim-controller  failed to create PDB for volume: pvc-14297415-5f2a-406f-bf8b-87a1a5006742: failed to list PDB belongs to pools with selector openebs.io/cstor-disk-pool-l2fb=true,openebs.io/cstor-disk-pool-54zn=true,openebs.io/cstor-disk-pool-ffvp=true: the server could not find the requested resource
  Warning  Provisioning  3m17s (x18 over 8m42s)  cstorvolumeclaim-controller  failed to create PDB for volume: pvc-14297415-5f2a-406f-bf8b-87a1a5006742: failed to list PDB belongs to pools with selector openebs.io/cstor-disk-pool-54zn=true,openebs.io/cstor-disk-pool-ffvp=true,openebs.io/cstor-disk-pool-l2fb=true: the server could not find the requested resource

Logs from one of the pool pods:

I0907 06:52:21.373440       8 event.go:282] Event(v1.ObjectReference{Kind:"CStorVolumeReplica", Namespace:"openebs", Name:"pvc-5a9e63ce-1c6d-4c53-bb7f-dd4782360736-cstor-disk-pool-54zn", UID:"7f1d146f-4c2c-4a91-a3b0-9b0500867ce1", APIVersion:"cstor.openebs.io/v1", ResourceVersion:"138978", FieldPath:""}): type: 'Normal' reason: 'Synced' Received Resource create event
I0907 06:52:21.389429       8 handler.go:226] will process add event for cvr {pvc-5a9e63ce-1c6d-4c53-bb7f-dd4782360736-cstor-disk-pool-54zn} as volume {cstor-fb027a66-716a-4abf-b643-3a336cc3da6a/pvc-5a9e63ce-1c6d-4c53-bb7f-dd4782360736}
I0907 06:52:21.393542       8 handler.go:572] cVR 'pvc-5a9e63ce-1c6d-4c53-bb7f-dd4782360736-cstor-disk-pool-54zn': uid '7f1d146f-4c2c-4a91-a3b0-9b0500867ce1': phase 'Init': is_empty_status: false
I0907 06:52:21.393557       8 handler.go:584] cVR pending: 7f1d146f-4c2c-4a91-a3b0-9b0500867ce1
2022-09-07T06:52:21.527Z        INFO    volumereplica/volumereplica.go:308              {"eventcode": "cstor.volume.replica.create.success", "msg": "Successfully created CStor volume replica", "rname": "cstor-fb027a66-716a-4abf-b643-3a336cc3da6a/pvc-5a9e63ce-1c6d-4c53-bb7f-dd4782360736"}
I0907 06:52:21.527245       8 handler.go:469] cVR creation successful: pvc-5a9e63ce-1c6d-4c53-bb7f-dd4782360736-cstor-disk-pool-54zn, 7f1d146f-4c2c-4a91-a3b0-9b0500867ce1
I0907 06:52:21.527559       8 event.go:282] Event(v1.ObjectReference{Kind:"CStorVolumeReplica", Namespace:"openebs", Name:"pvc-5a9e63ce-1c6d-4c53-bb7f-dd4782360736-cstor-disk-pool-54zn", UID:"7f1d146f-4c2c-4a91-a3b0-9b0500867ce1", APIVersion:"cstor.openebs.io/v1", ResourceVersion:"138980", FieldPath:""}): type: 'Normal' reason: 'Created' Resource created successfully
I0907 06:52:21.538547       8 event.go:282] Event(v1.ObjectReference{Kind:"CStorVolumeReplica", Namespace:"openebs", Name:"pvc-5a9e63ce-1c6d-4c53-bb7f-dd4782360736-cstor-disk-pool-54zn", UID:"7f1d146f-4c2c-4a91-a3b0-9b0500867ce1", APIVersion:"cstor.openebs.io/v1", ResourceVersion:"138980", FieldPath:""}): type: 'Warning' reason: 'SyncFailed' failed to sync CVR error: unable to update snapshot list details in CVR: failed to get the list of snapshots: Output: failed listsnap command for cstor-fb027a66-716a-4abf-b643-3a336cc3da6a/pvc-5a9e63ce-1c6d-4c53-bb7f-dd4782360736 with err 11
Error: exit status 11
I0907 06:52:21.563031       8 event.go:282] Event(v1.ObjectReference{Kind:"CStorVolumeReplica", Namespace:"openebs", Name:"pvc-5a9e63ce-1c6d-4c53-bb7f-dd4782360736-cstor-disk-pool-54zn", UID:"7f1d146f-4c2c-4a91-a3b0-9b0500867ce1", APIVersion:"cstor.openebs.io/v1", ResourceVersion:"139013", FieldPath:""}): type: 'Warning' reason: 'SyncFailed' failed to sync CVR error: unable to update snapshot list details in CVR: failed to get the list of snapshots: Output: failed listsnap command for cstor-fb027a66-716a-4abf-b643-3a336cc3da6a/pvc-5a9e63ce-1c6d-4c53-bb7f-dd4782360736 with err 11
Error: exit status 11

How to solve

Upon debugging, found out that cStor operators is using v1beta1 version of PodDisruptBudegt object in its codebase which was deprecated in K8s 1.21 version and is completely removed in K8s 1.25 version.

We need upgrade the usage version of PodDisruptBudegt to v1 in the codebase to enable cStor to work in K8s 1.25 or later versions

Ab-hishek avatar Sep 09 '22 06:09 Ab-hishek

In case someone find this issue: The fix is already merged and will be release with 3.4.0 (?) https://github.com/openebs/cstor-operators/pull/436

ThomasBuchinger avatar Oct 24 '22 12:10 ThomasBuchinger

I test it. It was supported in v3.4.x

Godfunc avatar Nov 05 '22 13:11 Godfunc

i installed the helm from https://openebs.github.io/cstor-operators/

helm install openebs-cstor openebs-cstor/cstor -n openebs --create-namespace Screen Shot 2022-11-08 at 10 39 48 pm it comes with version 3.3.0

when i create as below; ... apiVersion: cstor.openebs.io/v1 kind: CStorPoolCluster metadata: name: cstor-storage namespace: openebs spec: pools: - nodeSelector: kubernetes.io/hostname: "nlgkube1" dataRaidGroups: - blockDevices: - blockDeviceName: "blockdevice-555b0dea91b5518752dcb2a682243507" poolConfig: dataRaidGroupType: "stripe"

- nodeSelector:
    kubernetes.io/hostname: "nlgkube2" 
  dataRaidGroups:
    - blockDevices:
        - blockDeviceName: "blockdevice-3f880e51eded0a6aa5a30196c90662cf"
  poolConfig:
    dataRaidGroupType: "stripe"

- nodeSelector:
    kubernetes.io/hostname: "nlgkube3"
  dataRaidGroups:
    - blockDevices:
        - blockDeviceName: "blockdevice-3591f0cc0dee841d261f57b47135ff35"
  poolConfig:
    dataRaidGroupType: "stripe"

kind: StorageClass apiVersion: storage.k8s.io/v1 metadata: name: cstor-csi-disk provisioner: cstor.csi.openebs.io allowVolumeExpansion: true parameters: cas-type: cstor

cstorPoolCluster should have the name of the CSPC

cstorPoolCluster: cstor-disk-pool

replicaCount should be <= no. of CSPI created in the selected CSPC

replicaCount: "3" ... ... kubectl get cvc -A NAMESPACE NAME CAPACITY STATUS AGE openebs pvc-a45c7e9b-bfd5-490a-8dd4-ba6462321a59 Pending 22m ... describe shows: ... Type Reason Age From Message


Warning Provisioning 22m cstorvolumeclaim-controller services "pvc-a45c7e9b-bfd5-490a-8dd4-ba6462321a59" already exists Warning Provisioning 2m34s (x55 over 22m) cstorvolumeclaim-controller not enough pools are available of provided CSPC: "cstor-disk-pool", usable pool count: 0 pending replica count: 3 Warning Provisioning 2m19s (x57 over 22m) cstorvolumeclaim-controller not enough pools are available of provided CSPC: "cstor-disk-pool", usable pool count: 0 pending replica count: 3 ...

the pvc shows OK. ... kubectl get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE demo-cstor-vol Bound pvc-a45c7e9b-bfd5-490a-8dd4-ba6462321a59 5Gi RWO cstor-csi-disk 23m ... and the cspc ... kubectl get cspc -A NAMESPACE NAME HEALTHYINSTANCES PROVISIONEDINSTANCES DESIREDINSTANCES AGE openebs cstor-storage 3 3 3 24m ... ... kubectl get cspi -A NAMESPACE NAME HOSTNAME FREE CAPACITY READONLY PROVISIONEDREPLICAS HEALTHYREPLICAS STATUS AGE openebs cstor-storage-b5n7 nlgkube3 96400M 96400086k false 0 0 ONLINE 24m openebs cstor-storage-bk6t nlgkube2 96400M 96400086k false 0 0 ONLINE 24m openebs cstor-storage-mx7g nlgkube1 96400M 96400086k false 0 0 ONLINE 24m ...

Everything looks OK what could be the issue ! Thanks for any help

jadsy2107 avatar Nov 08 '22 11:11 jadsy2107

I can confirm that the issue was resolved by using 3.4.0 - 3.3.0 failed and the minute i upgraded the deployments and all reference to the 3.3.0 image to be 3.4.0 - everything worked.

jadsy2107 avatar Nov 08 '22 12:11 jadsy2107

kubectl get cspi -o wide
NAME                 HOSTNAME   ALLOCATED   FREE     CAPACITY    READONLY   PROVISIONEDREPLICAS   HEALTHYREPLICAS   TYPE     STATUS   AGE
cstor-storage-drvm   nlgkube2   230k        96400M   96400230k   false      0                     0                 stripe   ONLINE   2m22s
cstor-storage-kztz   nlgkube1   614k        96400M   96400614k   false      0                     0                 stripe   ONLINE   2m22s
cstor-storage-lhgb   nlgkube3   230k        96400M   96400230k   false      0                     0                 stripe   ONLINE   2m21s

It seems there is no replicas provisioned ... wondering why ?

jadsy2107 avatar Nov 08 '22 13:11 jadsy2107

I had a typo ! In the storage class not linking to the disk pool duhhhh

jadsy2107 avatar Nov 09 '22 07:11 jadsy2107

waitting v3.4.0 helm charts, but anyway deploy 3.4.0 now?

willzhang avatar Nov 16 '22 16:11 willzhang

No need to wait, https://github.com/openebs/velero-plugin/issues/183

Read that through and see last comments for cstor-operator that works :)


From: will @.> Sent: Thursday, November 17, 2022 3:56:08 AM To: openebs/cstor-operators @.> Cc: Jad Seifeddine @.>; Comment @.> Subject: Re: [openebs/cstor-operators] cStor using the removed APIs in k8s 1.25 requires changes (Issue #435)

waitting v3.4.0 helm charts

— Reply to this email directly, view it on GitHubhttps://github.com/openebs/cstor-operators/issues/435#issuecomment-1317346917, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AZR4BBMKYPJSPRVTARGNEDDWIUG2RANCNFSM6AAAAAAQIMSKOU. You are receiving this because you commented.Message ID: @.***>

jadsy2107 avatar Nov 17 '22 14:11 jadsy2107

https://github.com/openebs/velero-plugin/issues/183#issuecomment-1317675988

jadsy2107 avatar Nov 17 '22 14:11 jadsy2107

any plan to release helm chart 3.4.0?

mmelyp avatar Jan 04 '23 18:01 mmelyp