machine-controller
machine-controller copied to clipboard
Proposal for user cluster nodes remediation
As a cluster admin I would like to have a mechanism that will allow me to safely drain bare metal nodes with respect to user cluster nodes running on top (virtual machines).
Things we should check/change:
- Eviction strategy of VMs should be set to External
- We should respect run strategy manual (in that case do nothing and leave it to users)
- Check possibility of live migration - probably we will have to follow re-create VM nodes strategy
- Allow to trigger a user cluster node remediation by setting annotation (mostly for tests)
Acceptance criteria:
- A document that describe how to:
- Support user cluster workload eviction caused by bare-metal node draining
- How to integrate the mechanism with machine controller's node eviction
- Design of e2e tests that covers the feature
- Issues in the epic will be created based on the document