csi-driver icon indicating copy to clipboard operation
csi-driver copied to clipboard

Volume detach race condition

Open mfranczy opened this issue 2 years ago • 24 comments

Is this a BUG REPORT or FEATURE REQUEST?: /kind bug

What happened: While draining KubeVirt infrastructure nodes (bare-metal nodes) we evict virtual machines workload to another vm nodes. Sometimes it happens that during eviction process the recreated pod on different vms gets error (not always):

Warning FailedAttachVolume 3m40s attachdetach-controller Multi-Attach error for volume "pvc-04bf24ee-a755-4bee-bbcb-559aca75d862" Volume is already exclusively attached to one node and can't be attached to another

Further investigation showed that this is because a volumeattachment resource is not being deleted due to error coming from the KubeVirt CSI driver:

Detach Error: Message: rpc error: code = NotFound desc = failed to find VM with domain.firmware.uuid e08f36d8-de8d-5365-a683-5f43a5be323a Time: 2023-02-09T15:45:00Z

There is a race condition between VM and volume attachment deletion.

What you expected to happen: Volume attachment resource for non existing vms deleted.

How to reproduce it (as minimally and precisely as possible):

  1. Create a simple pod with PVC that is using storage class with KubeVirt CSI driver.
  2. Drain KubeVirt infrastracture node
  3. Immediately, after the previous pod has been deleted create another one with the same pvc.

That's the easiest way (with deployment it's harder to spot the bug as it depends on reconciliation timing).

Anything else we need to know?: I think this is a problematic part of the code: https://github.com/kubevirt/csi-driver/blob/main/pkg/service/controller.go#L320-L323

Environment:

  • KubeVirt CSI version: commit cc71b72b8d5a205685985244c61707c5e40c9d5f
  • Kubernetes version (use kubectl version): Server Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.6", GitCommit:"ad3338546da947756e8a88aa6822e9c11e7eac22", GitTreeState:"clean", BuildDate:"2022-05-19T19:39:28Z", GoVersion:"go1.17.9", Compiler:"gc", Platform:"linux/amd64"}
  • Cloud provider or hardware configuration:
  • Install tools: Kubermatic platform, however, it's not platform related.
  • Others:

mfranczy avatar Feb 13 '23 14:02 mfranczy