kadalu
kadalu copied to clipboard
[Bug]: Over-provisioning stops working when one of the PVC is resized
Describe the bug I am using Kadalu 0.9.1 in external native mode, Gluster 10.5, and K3s I have created a kadalu storage that uses an external gluster volume 29GB
kubectl exec -it deploy/operator -n kadalu -- bash -c 'kubectl-kadalu storage-list --status'
Name Type Utilization Pvs Count Min PV Size Avg PV Size Max PV Size
kadalu-read-cache External 0/29 Gi (0%) 0 0 0 0
kubectl get kadalustorage kadalu-read-cache -o jsonpath='{.spec.details.gluster_volname}'
read-cache
gluster volume list
read-cache
Even if the gluster volume is 29GB I can create 3 PVCs 20GB each, so far the over-provisioning is good:
kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
test1 Bound pvc-febb8a9c-785b-4911-9c0d-a3d1d7b3bca9 20Gi RWX kadalu.kadalu-read-cache 62s
test2 Bound pvc-542924b4-a5f2-4a1e-8da7-6da887f3b564 20Gi RWX kadalu.kadalu-read-cache 43s
test3 Bound pvc-36d84616-8a1a-4b03-85b4-203f18919daa 20Gi RWX kadalu.kadalu-read-cache 32s
However, it's pretty odd that kubectl-kadalu storage-list --status
show no space used a no PVCs
kubectl exec -it deploy/operator -n kadalu -- bash -c 'kubectl-kadalu storage-list --status'
Name Type Utilization Pvs Count Min PV Size Avg PV Size Max PV Size
kadalu-read-cache External 0/29 Gi (0%) 0 0 0 0
I resize one of the PVC and the resize worked (from 20GB to 23GB):
kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
test1 Bound pvc-febb8a9c-785b-4911-9c0d-a3d1d7b3bca9 23Gi RWX kadalu.kadalu-read-cache 90s
test2 Bound pvc-542924b4-a5f2-4a1e-8da7-6da887f3b564 20Gi RWX kadalu.kadalu-read-cache 71s
test3 Bound pvc-36d84616-8a1a-4b03-85b4-203f18919daa 20Gi RWX kadalu.kadalu-read-cache 60s
Now kubectl-kadalu storage-list --status
take into count only the PVC that has been resized
kubectl exec -it deploy/operator -n kadalu -- bash -c 'kubectl-kadalu storage-list --status'
Name Type Utilization Pvs Count Min PV Size Avg PV Size Max PV Size
kadalu-read-cache External 23 Gi/29 Gi (78%) 1 23 Gi 23 Gi 23 Gi
If I try to create another PVC 20GB it stay pending forever:
kubectl get pvc
test1 Bound pvc-febb8a9c-785b-4911-9c0d-a3d1d7b3bca9 23Gi RWX kadalu.kadalu-read-cache 36m
test2 Bound pvc-542924b4-a5f2-4a1e-8da7-6da887f3b564 20Gi RWX kadalu.kadalu-read-cache 35m
test3 Bound pvc-36d84616-8a1a-4b03-85b4-203f18919daa 20Gi RWX kadalu.kadalu-read-cache 35m
test4 Pending kadalu.kadalu-read-cache 34m
kubectl describe pvc test4
Name: test4
Namespace: default
StorageClass: kadalu.kadalu-read-cache
Status: Pending
Volume:
Labels: <none>
Annotations: volume.beta.kubernetes.io/storage-provisioner: kadalu
volume.kubernetes.io/storage-provisioner: kadalu
Finalizers: [kubernetes.io/pvc-protection]
Capacity:
Access Modes:
VolumeMode: Filesystem
Used By: <none>
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Provisioning 94s (x15 over 35m) kadalu_kadalu-csi-provisioner-0_6e9906fe-7887-4836-bce7-173516e98dad External provisioner is provisioning volume for claim "default/test4"
Warning ProvisioningFailed 94s (x15 over 35m) kadalu_kadalu-csi-provisioner-0_6e9906fe-7887-4836-bce7-173516e98dad failed to provision volume with StorageClass "kadalu.kadalu-read-cache": rpc error: code = ResourceExhausted desc = External resource is exhausted
Normal ExternalProvisioning 0s (x142 over 35m) persistentvolume-controller waiting for a volume to be created, either by external provisioner "kadalu" or manually created by system administrator
Debug logs:
[2023-12-15 17:30:38,414] DEBUG [controllerserver - 100:CreateVolume] - Create Volume request request=name: "pvc-f3a76282-c6e0-42ab-8009-07985e51ed82"
capacity_range {
required_bytes: 21474836480
}
volume_capabilities {
mount {
}
access_mode {
mode: MULTI_NODE_MULTI_WRITER
}
}
parameters {
key: "gluster_hosts"
value: "cluster-node1"
}
parameters {
key: "gluster_volname"
value: "read-cache"
}
parameters {
key: "hostvol_type"
value: "External"
}
parameters {
key: "single_pv_per_pool"
value: "False"
}
[2023-12-15 17:30:38,420] DEBUG [volumeutils - 1175:mount_glusterfs] - Already mounted mount=/mnt/kadalu-read-cache
[2023-12-15 17:30:38,435] DEBUG [volumeutils - 1175:mount_glusterfs] - Already mounted mount=/mnt/kadalu-write-cache
[2023-12-15 17:30:38,441] DEBUG [controllerserver - 161:CreateVolume] - Found PV type pvtype=subvol capabilities=[mount {
}
access_mode {
mode: MULTI_NODE_MULTI_WRITER
}
]
[2023-12-15 17:30:38,441] DEBUG [controllerserver - 174:CreateVolume] - Filters applied to choose storage hostvol_type=External gluster_hosts=cluster-node1 single_pv_per_pool=False gluster_volname=read-cache
[2023-12-15 17:30:38,442] DEBUG [controllerserver - 185:CreateVolume] - Got list of hosting Volumes volumes=kadalu-read-cache,kadalu-write-cache
[2023-12-15 17:30:38,447] DEBUG [volumeutils - 1175:mount_glusterfs] - Already mounted mount=/mnt/kadalu-read-cache
[2023-12-15 17:30:38,448] DEBUG [volumeutils - 1406:check_external_volume] - Mount successful hvol={'name': 'kadalu-read-cache', 'type': 'External', 'g_volname': 'read-cache', 'g_host': 'cluster-node1', 'g_options': '', 'single_pv_per_pool': False}
[2023-12-15 17:30:38,530] DEBUG [volumeutils - 443:is_hosting_volume_free] - pv stats hostvol=kadalu-read-cache total_size_bytes=31509606400 used_size_bytes=24696061952 free_size_bytes=6813544448 number_of_pvs=1 required_size=21474836480 reserved_size=681354444.8
[2023-12-15 17:30:38,530] ERROR [controllerserver - 262:CreateVolume] - Hosting volume is full. Add more storage volume=kadalu-read-cache
Same issue is present in Kadalu 1.2.0
Issue is not present in Kadalu 0.8.14, though in this release command kubectl-kadalu storage-list --status
doesn't work
# kubectl exec -it deploy/operator -n kadalu -- bash -c 'kubectl-kadalu storage-list --status' Traceback (most recent call last): File "/usr/local/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/local/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/usr/bin/kubectl-kadalu/__main__.py", line 117, in <module> File "/usr/bin/kubectl-kadalu/__main__.py", line 108, in main File "/usr/bin/kubectl-kadalu/storage_list.py", line 237, in run File "/usr/bin/kubectl-kadalu/storage_list.py", line 197, in fetch_status IndexError: list index out of range
If the logic in "expansion" should be the same as in "create" then the update_free_size() should't be called for PV_TYPE_SUBVOL even during "expansion", while currently for PV_TYPE_SUBVOL it's not called in "created" and called in "expansion"
Is it possible to send the PR if the fixes in update_free_size()
works?
Before doing a PR I guess we have to establish whether Kadalu support over-provisioning for External native mode or not. That's not clear to me cause the code doesn't call update_free_size() during PVC create (so you can create as many PVC as you want, even over the space available in the external gluster volume). However, when a PVC is expanded the update_free_size() is called to update the space available in the external gluster volume. If we support over-provisioning we should never verify the space available in the gluster volume before creating or expanding a PVC. If we don't support over-provisioning then we should call update_free_size() both during creation and during expansion.
If we don't support over-provisioning then we should call update_free_size() both during creation and during expansion.
- as commented in the PR, I believe this should be fix, i.e, don't support over-provision