machine-controller Proposal for user cluster nodes remediation

Proposal for user cluster nodes remediation

Open mfranczy opened this issue 2 years ago • 0 comments

As a cluster admin I would like to have a mechanism that will allow me to safely drain bare metal nodes with respect to user cluster nodes running on top (virtual machines).

Things we should check/change:

Eviction strategy of VMs should be set to External
We should respect run strategy manual (in that case do nothing and leave it to users)
Check possibility of live migration - probably we will have to follow re-create VM nodes strategy
Allow to trigger a user cluster node remediation by setting annotation (mostly for tests)

Acceptance criteria:

A document that describe how to:
- Support user cluster workload eviction caused by bare-metal node draining
- How to integrate the mechanism with machine controller's node eviction
- Design of e2e tests that covers the feature
Issues in the epic will be created based on the document

Sep 12 '22 13:09 mfranczy

machine-controller machine-controller copied to clipboard

Proposal for user cluster nodes remediation

machine-controller
machine-controller copied to clipboard