csi-driver
csi-driver copied to clipboard
Volume detach race condition
Is this a BUG REPORT or FEATURE REQUEST?: /kind bug
What happened: While draining KubeVirt infrastructure nodes (bare-metal nodes) we evict virtual machines workload to another vm nodes. Sometimes it happens that during eviction process the recreated pod on different vms gets error (not always):
Warning FailedAttachVolume 3m40s attachdetach-controller Multi-Attach error for volume "pvc-04bf24ee-a755-4bee-bbcb-559aca75d862" Volume is already exclusively attached to one node and can't be attached to another
Further investigation showed that this is because a volumeattachment
resource is not being deleted due to error coming from the KubeVirt CSI driver:
Detach Error: Message: rpc error: code = NotFound desc = failed to find VM with domain.firmware.uuid e08f36d8-de8d-5365-a683-5f43a5be323a Time: 2023-02-09T15:45:00Z
There is a race condition between VM and volume attachment deletion.
What you expected to happen: Volume attachment resource for non existing vms deleted.
How to reproduce it (as minimally and precisely as possible):
- Create a simple pod with PVC that is using storage class with KubeVirt CSI driver.
- Drain KubeVirt infrastracture node
- Immediately, after the previous pod has been deleted create another one with the same pvc.
That's the easiest way (with deployment it's harder to spot the bug as it depends on reconciliation timing).
Anything else we need to know?: I think this is a problematic part of the code: https://github.com/kubevirt/csi-driver/blob/main/pkg/service/controller.go#L320-L323
Environment:
- KubeVirt CSI version: commit
cc71b72b8d5a205685985244c61707c5e40c9d5f
- Kubernetes version (use
kubectl version
):Server Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.6", GitCommit:"ad3338546da947756e8a88aa6822e9c11e7eac22", GitTreeState:"clean", BuildDate:"2022-05-19T19:39:28Z", GoVersion:"go1.17.9", Compiler:"gc", Platform:"linux/amd64"}
- Cloud provider or hardware configuration:
- Install tools: Kubermatic platform, however, it's not platform related.
- Others: