flatcar-linux-update-operator icon indicating copy to clipboard operation
flatcar-linux-update-operator copied to clipboard

WIP: pkg/agent: wait for all volumes to be detached before rebooting

Open invidian opened this issue 3 years ago • 3 comments

This commit provides PoC version of implementing agent waiting for all volumtes attached to the node to be detached as a step after draining the node, as shutting down the Pod does not mean the volume has been detached, as usually CSI agent will be running as a DaemonSet on the node and will take care of detaching the volume from the node when the pod shuts down.

This commit improves rebooting experience, as right now if there is not enough time for CSI agent to detach the volumes from the node, node gets rebooted and pods using attached volumes have no way to be attached to other nodes, which effectively increases the downtime caused for stateful workloads.

This commit still requires tests and better interface for the users.

If someone wants to try this feature on their own cluster, I've published the following image I've been testing with:

quay.io/invidian/flatcar-linux-update-operator:97c0dee50c807dbba7d2debc59b369f84002797e

Closes #30

Signed-off-by: Mateusz Gozdek [email protected]

invidian avatar Jun 13 '22 17:06 invidian

We should also consider compatibility with k8s versions before merging.

invidian avatar Jun 13 '22 17:06 invidian

Just hit some issue with this code:

  1. Draining failed because one workload couldn't satisfy the PDB.
  2. Waiting for volume detachment never finished.

Perhaps we should also have some timeout while waiting for volumes to be detached.

invidian avatar Jun 26 '22 15:06 invidian

I also found a bug with RBAC which is now fixed.

invidian avatar Jan 11 '23 18:01 invidian