node-disk-manager icon indicating copy to clipboard operation
node-disk-manager copied to clipboard

BlockDevices are shown active even after node goes down

Open akhilerm opened this issue 5 years ago • 2 comments

What happened: I had a 3 node GKE cluster, with a single GPD attached to each node. Installed NDM from the ndm-operator.yaml in v0.4.0 tag. Disks and BlockDevices were created for all the 3 disks. I deleted the node pool containing the 3 nodes, without detaching the disks. The nodes were deleted and disks were shown as disconnected. Created a new node pool, and did not attach any disks. But the disk and block device resources were still shown as active, even though they were not connected.

What you expected to happen: Disk and BlockDevice resource should be in inactive / unknown state.

How to reproduce it (as minimally and precisely as possible): create a GKE cluster, attach GPDs, and install NDM. Delete the node nodes. Disks and BDs will be in active state.

Anything else we need to know?: It happens because the node is not shutdown gracefully.

akhilerm avatar Jul 25 '19 13:07 akhilerm

Only when the node is shutdown gracefully, the disks and BDs associated with are marked as Unknown. When node goes down abruptly, the daemonset cannot manage its devices, a higher level operator should take care of marking those devices as Unknown.

akhilerm avatar Jul 26 '19 10:07 akhilerm

The implementation of this feature should consider enhancing the NDM Operator to identify stale BDs and take corrective actions.

kmova avatar Aug 29 '21 01:08 kmova