kadalu icon indicating copy to clipboard operation
kadalu copied to clipboard

[Bug]: Over-provisioning stops working when one of the PVC is resized

Open handrea2009 opened this issue 1 year ago • 6 comments

Describe the bug I am using Kadalu 0.9.1 in external native mode, Gluster 10.5, and K3s I have created a kadalu storage that uses an external gluster volume 29GB

kubectl exec -it deploy/operator -n kadalu -- bash -c 'kubectl-kadalu storage-list --status'

Name             Type        Utilization            Pvs Count      Min PV Size      Avg PV Size      Max PV Size
kadalu-read-cache  External    0/29 Gi (0%)                   0                0                0                0
kubectl get kadalustorage kadalu-read-cache -o jsonpath='{.spec.details.gluster_volname}'
read-cache
gluster volume list
read-cache

Even if the gluster volume is 29GB I can create 3 PVCs 20GB each, so far the over-provisioning is good:

kubectl get pvc
NAME                                           STATUS   VOLUME                                     CAPACITY    ACCESS MODES   STORAGECLASS               AGE
test1                                          Bound    pvc-febb8a9c-785b-4911-9c0d-a3d1d7b3bca9   20Gi        RWX            kadalu.kadalu-read-cache   62s
test2                                          Bound    pvc-542924b4-a5f2-4a1e-8da7-6da887f3b564   20Gi        RWX            kadalu.kadalu-read-cache   43s
test3                                          Bound    pvc-36d84616-8a1a-4b03-85b4-203f18919daa   20Gi        RWX            kadalu.kadalu-read-cache   32s

However, it's pretty odd that kubectl-kadalu storage-list --status show no space used a no PVCs

kubectl exec -it deploy/operator -n kadalu -- bash -c 'kubectl-kadalu storage-list --status'

Name             Type        Utilization            Pvs Count      Min PV Size      Avg PV Size      Max PV Size
kadalu-read-cache  External    0/29 Gi (0%)                   0                0                0                0

I resize one of the PVC and the resize worked (from 20GB to 23GB):

kubectl get pvc
NAME                                           STATUS   VOLUME                                     CAPACITY    ACCESS MODES   STORAGECLASS               AGE
test1                                          Bound    pvc-febb8a9c-785b-4911-9c0d-a3d1d7b3bca9   23Gi        RWX            kadalu.kadalu-read-cache   90s
test2                                          Bound    pvc-542924b4-a5f2-4a1e-8da7-6da887f3b564   20Gi        RWX            kadalu.kadalu-read-cache   71s
test3                                          Bound    pvc-36d84616-8a1a-4b03-85b4-203f18919daa   20Gi        RWX            kadalu.kadalu-read-cache   60s

Now kubectl-kadalu storage-list --status take into count only the PVC that has been resized

kubectl exec -it deploy/operator -n kadalu -- bash -c 'kubectl-kadalu storage-list --status'

Name             Type        Utilization            Pvs Count      Min PV Size      Avg PV Size      Max PV Size
kadalu-read-cache  External    23 Gi/29 Gi (78%)              1            23 Gi            23 Gi            23 Gi

If I try to create another PVC 20GB it stay pending forever:

kubectl get pvc
test1                                          Bound     pvc-febb8a9c-785b-4911-9c0d-a3d1d7b3bca9   23Gi        RWX            kadalu.kadalu-read-cache   36m
test2                                          Bound     pvc-542924b4-a5f2-4a1e-8da7-6da887f3b564   20Gi        RWX            kadalu.kadalu-read-cache   35m
test3                                          Bound     pvc-36d84616-8a1a-4b03-85b4-203f18919daa   20Gi        RWX            kadalu.kadalu-read-cache   35m
test4                                          Pending                                                                         kadalu.kadalu-read-cache   34m
kubectl describe pvc test4
Name:          test4
Namespace:     default
StorageClass:  kadalu.kadalu-read-cache
Status:        Pending
Volume:
Labels:        <none>
Annotations:   volume.beta.kubernetes.io/storage-provisioner: kadalu
               volume.kubernetes.io/storage-provisioner: kadalu
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:
Access Modes:
VolumeMode:    Filesystem
Used By:       <none>
Events:
  Type     Reason                Age                 From                                                                  Message
  ----     ------                ----                ----                                                                  -------
  Normal   Provisioning          94s (x15 over 35m)  kadalu_kadalu-csi-provisioner-0_6e9906fe-7887-4836-bce7-173516e98dad  External provisioner is provisioning volume for claim "default/test4"
  Warning  ProvisioningFailed    94s (x15 over 35m)  kadalu_kadalu-csi-provisioner-0_6e9906fe-7887-4836-bce7-173516e98dad  failed to provision volume with StorageClass "kadalu.kadalu-read-cache": rpc error: code = ResourceExhausted desc = External resource is exhausted
  Normal   ExternalProvisioning  0s (x142 over 35m)  persistentvolume-controller                                           waiting for a volume to be created, either by external provisioner "kadalu" or manually created by system administrator

handrea2009 avatar Dec 15 '23 17:12 handrea2009

Debug logs:

[2023-12-15 17:30:38,414] DEBUG [controllerserver - 100:CreateVolume] - Create Volume request    request=name: "pvc-f3a76282-c6e0-42ab-8009-07985e51ed82"
capacity_range {
  required_bytes: 21474836480
}
volume_capabilities {
  mount {
  }
  access_mode {
    mode: MULTI_NODE_MULTI_WRITER
  }
}
parameters {
  key: "gluster_hosts"
  value: "cluster-node1"
}
parameters {
  key: "gluster_volname"
  value: "read-cache"
}
parameters {
  key: "hostvol_type"
  value: "External"
}
parameters {
  key: "single_pv_per_pool"
  value: "False"
}

[2023-12-15 17:30:38,420] DEBUG [volumeutils - 1175:mount_glusterfs] - Already mounted   mount=/mnt/kadalu-read-cache
[2023-12-15 17:30:38,435] DEBUG [volumeutils - 1175:mount_glusterfs] - Already mounted   mount=/mnt/kadalu-write-cache
[2023-12-15 17:30:38,441] DEBUG [controllerserver - 161:CreateVolume] - Found PV type    pvtype=subvol capabilities=[mount {
}
access_mode {
  mode: MULTI_NODE_MULTI_WRITER
}
]
[2023-12-15 17:30:38,441] DEBUG [controllerserver - 174:CreateVolume] - Filters applied to choose storage        hostvol_type=External gluster_hosts=cluster-node1 single_pv_per_pool=False gluster_volname=read-cache
[2023-12-15 17:30:38,442] DEBUG [controllerserver - 185:CreateVolume] - Got list of hosting Volumes      volumes=kadalu-read-cache,kadalu-write-cache
[2023-12-15 17:30:38,447] DEBUG [volumeutils - 1175:mount_glusterfs] - Already mounted   mount=/mnt/kadalu-read-cache
[2023-12-15 17:30:38,448] DEBUG [volumeutils - 1406:check_external_volume] - Mount successful    hvol={'name': 'kadalu-read-cache', 'type': 'External', 'g_volname': 'read-cache', 'g_host': 'cluster-node1', 'g_options': '', 'single_pv_per_pool': False}
[2023-12-15 17:30:38,530] DEBUG [volumeutils - 443:is_hosting_volume_free] - pv stats    hostvol=kadalu-read-cache total_size_bytes=31509606400 used_size_bytes=24696061952 free_size_bytes=6813544448 number_of_pvs=1 required_size=21474836480 reserved_size=681354444.8
[2023-12-15 17:30:38,530] ERROR [controllerserver - 262:CreateVolume] - Hosting volume is full. Add more storage         volume=kadalu-read-cache

handrea2009 avatar Dec 15 '23 17:12 handrea2009

Same issue is present in Kadalu 1.2.0 Issue is not present in Kadalu 0.8.14, though in this release command kubectl-kadalu storage-list --status doesn't work

# kubectl exec -it deploy/operator -n kadalu -- bash -c 'kubectl-kadalu storage-list --status' Traceback (most recent call last): File "/usr/local/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/local/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/usr/bin/kubectl-kadalu/__main__.py", line 117, in <module> File "/usr/bin/kubectl-kadalu/__main__.py", line 108, in main File "/usr/bin/kubectl-kadalu/storage_list.py", line 237, in run File "/usr/bin/kubectl-kadalu/storage_list.py", line 197, in fetch_status IndexError: list index out of range

handrea2009 avatar Dec 18 '23 20:12 handrea2009

If the logic in "expansion" should be the same as in "create" then the update_free_size() should't be called for PV_TYPE_SUBVOL even during "expansion", while currently for PV_TYPE_SUBVOL it's not called in "created" and called in "expansion"

handrea2009 avatar Dec 20 '23 14:12 handrea2009

Is it possible to send the PR if the fixes in update_free_size() works?

amarts avatar Jan 16 '24 05:01 amarts

Before doing a PR I guess we have to establish whether Kadalu support over-provisioning for External native mode or not. That's not clear to me cause the code doesn't call update_free_size() during PVC create (so you can create as many PVC as you want, even over the space available in the external gluster volume). However, when a PVC is expanded the update_free_size() is called to update the space available in the external gluster volume. If we support over-provisioning we should never verify the space available in the gluster volume before creating or expanding a PVC. If we don't support over-provisioning then we should call update_free_size() both during creation and during expansion.

handrea2009 avatar Jan 18 '24 15:01 handrea2009

If we don't support over-provisioning then we should call update_free_size() both during creation and during expansion.

  • as commented in the PR, I believe this should be fix, i.e, don't support over-provision

leelavg avatar Apr 12 '24 13:04 leelavg