redpanda
redpanda copied to clipboard
k8s: integrate maintenance mode with lifecycle hooks
At a minimum, shutting down a node should be graceful (https://github.com/vectorizedio/redpanda/issues/3020) and rolling restarts should be limited to one node at a time.
To the extent that life cycle hooks can invoke cluster level API endpoints then additional safety can be added such as waiting for cluster to become healthy before moving on to the upgrade of the next node.
There is likely going to be more trade-offs for k8s upgrades that are not driven by the operator because the limitations are what is possible with life cycle hooks.
/backport v22.1.x
@joejulian @dotnwat is this ticket still relvant?
i suppose it might be since since have k8s deployments that don't have ephemeral disks. on those systems, we don't need the full decommission/node-add procedure for rolling upgrades. also, probably relevant for normal RP upgrades on ephemeral disk systems.
"integrate maintenance mode with lifecycle hooks" is implemented if that only means adding the postStart and preStop scripts to put a broker into maintenance mode when shutting down and bringing it out of maintenance mode when coming up.