kadalu
kadalu copied to clipboard
Node plugin is failing with NodeUnpublishVolume exception constanly
Describe the bug Node plugin is failing with NodeUnpublishVolume exception constanly.
To Reproduce Steps to reproduce the behavior:
- Install kadalu 1.1.0 on K8s v1.27.2
Expected behavior CSI driver to support PUBLISH_UNPUBLISH_VOLUME (?)
Actual behavior
$ kubectl logs kadalu-csi-provisioner-0
.
.
.
I0606 07:35:36.908846 1 common.go:111] Probing CSI driver for readiness
W0606 07:35:36.910957 1 metrics.go:142] metrics endpoint will not be started because metrics-address
was not specified.
I0606 07:35:36.913309 1 csi-provisioner.go:210] CSI driver does not support PUBLISH_UNPUBLISH_VOLUME, not watching VolumeAttachments
.
.
.
$ kubectl logs kadalu-csi-nodeplugin-d9f7z -c kadalu-nodeplugin
.
.
.
[2023-06-06 07:56:59,718] ERROR [_server - 454:_call_behavior] - Exception calling application: [1]
Traceback (most recent call last):
File "/kadalu/lib/python3.10/site-packages/grpc/_server.py", line 444, in _call_behavior
response_or_iterator = behavior(argument, context)
File "/kadalu/nodeserver.py", line 145, in NodeUnpublishVolume
unmount_volume(request.target_path)
File "/kadalu/volumeutils.py", line 901, in unmount_volume
device, _, _ = execute(*cmd)
File "/kadalu/kadalulib.py", line 187, in execute
raise CommandException(proc.returncode, out.strip(), err.strip())
kadalulib.CommandException: [1]
.
.
.
Environment:
- Kadalu Version: 1.1.0
- K8S_DIST: kubernetes v1.27.2
- seems similar to #948
CSI driver to support PUBLISH_UNPUBLISH_VOLUME
-
no, we don't support it and it's expected
-
could you pls mention pool CR that you have used?
I just checked logs and they flooded all the time. I used:
$ kubectl kadalu storage-add replica3 --type=Replica3 --device node1:/dev/sda --device=node2:/dev/sda --device node3:/dev/sda
Also 7 pvcs (out of all 9 in my small k8s home cluster) are stuck in needing a heal:
$ kubectl-kadalu healinfo
Giving heal summary of volume replica3:
Brick server-replica3-0-0.replica3:/bricks/replica3/data/brick
Status: Connected
Total Number of entries: 7
Number of entries in heal pending: 7
Number of entries in split-brain: 0
Number of entries possibly healing: 0
Brick server-replica3-1-0.replica3:/bricks/replica3/data/brick
Status: Connected
Total Number of entries: 7
Number of entries in heal pending: 7
Number of entries in split-brain: 0
Number of entries possibly healing: 0
Brick server-replica3-2-0.replica3:/bricks/replica3/data/brick
Status: Connected
Total Number of entries: 7
Number of entries in heal pending: 7
Number of entries in split-brain: 0
Number of entries possibly healing: 0
List of files needing a heal on replica3:
Brick server-replica3-0-0.replica3:/bricks/replica3/data/brick
/subvol/55/72/pvc-1401651f-6ca3-4dc4-8594-eddf581dd79f
/subvol/96/4c/pvc-05f7121d-39b9-42a8-8754-d95dd11c427c
/subvol/ad/52/pvc-907c2880-f424-4e0b-9b1a-d1c56dd8add5
/subvol/b0/c1/pvc-716143fa-5283-43dc-a5c7-85a359905911
/subvol/e9/2e/pvc-2e987831-f392-419c-a73b-5e9b659b64f9
/subvol/ed/96/pvc-7d3a7237-4452-4990-91c1-bb8eabf892bd
/subvol/82/39/pvc-b07e6d57-141c-4e16-b32a-bd4ea67d80e9
Status: Connected
Number of entries: 7
Brick server-replica3-1-0.replica3:/bricks/replica3/data/brick
/subvol/55/72/pvc-1401651f-6ca3-4dc4-8594-eddf581dd79f
/subvol/96/4c/pvc-05f7121d-39b9-42a8-8754-d95dd11c427c
/subvol/ad/52/pvc-907c2880-f424-4e0b-9b1a-d1c56dd8add5
/subvol/b0/c1/pvc-716143fa-5283-43dc-a5c7-85a359905911
/subvol/e9/2e/pvc-2e987831-f392-419c-a73b-5e9b659b64f9
/subvol/ed/96/pvc-7d3a7237-4452-4990-91c1-bb8eabf892bd
/subvol/82/39/pvc-b07e6d57-141c-4e16-b32a-bd4ea67d80e9
Status: Connected
Number of entries: 7
Brick server-replica3-2-0.replica3:/bricks/replica3/data/brick
/subvol/55/72/pvc-1401651f-6ca3-4dc4-8594-eddf581dd79f
/subvol/96/4c/pvc-05f7121d-39b9-42a8-8754-d95dd11c427c
/subvol/ad/52/pvc-907c2880-f424-4e0b-9b1a-d1c56dd8add5
/subvol/b0/c1/pvc-716143fa-5283-43dc-a5c7-85a359905911
/subvol/e9/2e/pvc-2e987831-f392-419c-a73b-5e9b659b64f9
/subvol/ed/96/pvc-7d3a7237-4452-4990-91c1-bb8eabf892bd
/subvol/82/39/pvc-b07e6d57-141c-4e16-b32a-bd4ea67d80e9
Status: Connected
Number of entries: 7
List of files in splitbrain on replica3:
Brick server-replica3-0-0.replica3:/bricks/replica3/data/brick
Status: Connected
Number of entries in split-brain: 0
Brick server-replica3-1-0.replica3:/bricks/replica3/data/brick
Status: Connected
Number of entries in split-brain: 0
Brick server-replica3-2-0.replica3:/bricks/replica3/data/brick
Status: Connected
Number of entries in split-brain: 0
will track nodeplugin error in #948
Also 7 pvcs (out of all 9 in my small k8s home cluster) are stuck in needing a heal:
- as discovered in #952 the info surfaced to the user is not legit in some of the cases
- we are awaiting fix at gluster layer