redpanda icon indicating copy to clipboard operation
redpanda copied to clipboard

k8s: integrate maintenance mode with lifecycle hooks

Open dotnwat opened this issue 3 years ago • 4 comments

At a minimum, shutting down a node should be graceful (https://github.com/vectorizedio/redpanda/issues/3020) and rolling restarts should be limited to one node at a time.

To the extent that life cycle hooks can invoke cluster level API endpoints then additional safety can be added such as waiting for cluster to become healthy before moving on to the upgrade of the next node.

There is likely going to be more trade-offs for k8s upgrades that are not driven by the operator because the limitations are what is possible with life cycle hooks.

dotnwat avatar Nov 19 '21 04:11 dotnwat

/backport v22.1.x

nicolaferraro avatar Apr 23 '22 12:04 nicolaferraro

@joejulian @dotnwat is this ticket still relvant?

jcsp avatar Nov 03 '22 15:11 jcsp

i suppose it might be since since have k8s deployments that don't have ephemeral disks. on those systems, we don't need the full decommission/node-add procedure for rolling upgrades. also, probably relevant for normal RP upgrades on ephemeral disk systems.

dotnwat avatar Nov 16 '22 23:11 dotnwat

"integrate maintenance mode with lifecycle hooks" is implemented if that only means adding the postStart and preStop scripts to put a broker into maintenance mode when shutting down and bringing it out of maintenance mode when coming up.

joejulian avatar Nov 17 '22 03:11 joejulian