cke icon indicating copy to clipboard operation
cke copied to clipboard

Enable control plane to maintain during reboot

Open zoetrope opened this issue 3 years ago • 0 comments

What

Maintenance operation does not run while reboot operation is running.

  • Endpoints are not updated if one control-plane is down.
  • controller-manager and scheduler do not recover if they crash.

How

Run reboot operation on another go routine

  • Modify rebootOp as follows
    • after rebooting, remove cke.cybozu.com/reboot annotation
    • wait until the node is schedulable
    • remove the rebooted node from the reboot queue
  • Modify rebootUncordonOp as follows
    • only uncordon for the node which is unschedulable and no cke.cybozu.com/reboot` annotation
  • Register records of the reboot operation and do the operation by using a goroutine in order to avoid record registration conflicts with other operations.
  • Disable sabakan integration for the certain period during a reboot operation in order to avoid the frequent master node change

Checklist

  • [ ] Finish implentation of the issue
  • [ ] Test all functions
  • [ ] Have enough logs to trace activities
  • [ ] Notify developers of necessary actions

zoetrope avatar May 27 '21 02:05 zoetrope