node-disk-manager
node-disk-manager copied to clipboard
BlockDevices are shown active even after node goes down
What happened:
I had a 3 node GKE cluster, with a single GPD attached to each node. Installed NDM from the ndm-operator.yaml
in v0.4.0
tag. Disks and BlockDevices were created for all the 3 disks. I deleted the node pool containing the 3 nodes, without detaching the disks. The nodes were deleted and disks were shown as disconnected. Created a new node pool, and did not attach any disks. But the disk and block device resources were still shown as active, even though they were not connected.
What you expected to happen: Disk and BlockDevice resource should be in inactive / unknown state.
How to reproduce it (as minimally and precisely as possible): create a GKE cluster, attach GPDs, and install NDM. Delete the node nodes. Disks and BDs will be in active state.
Anything else we need to know?: It happens because the node is not shutdown gracefully.
Only when the node is shutdown gracefully, the disks and BDs associated with are marked as Unknown
. When node goes down abruptly, the daemonset cannot manage its devices, a higher level operator should take care of marking those devices as Unknown.
The implementation of this feature should consider enhancing the NDM Operator to identify stale BDs and take corrective actions.