machine-controller icon indicating copy to clipboard operation
machine-controller copied to clipboard

Proposal for user cluster nodes remediation

Open mfranczy opened this issue 2 years ago • 0 comments

As a cluster admin I would like to have a mechanism that will allow me to safely drain bare metal nodes with respect to user cluster nodes running on top (virtual machines).

Things we should check/change:

  • Eviction strategy of VMs should be set to External
  • We should respect run strategy manual (in that case do nothing and leave it to users)
  • Check possibility of live migration - probably we will have to follow re-create VM nodes strategy
  • Allow to trigger a user cluster node remediation by setting annotation (mostly for tests)

Acceptance criteria:

  • A document that describe how to:
    • Support user cluster workload eviction caused by bare-metal node draining
    • How to integrate the mechanism with machine controller's node eviction
    • Design of e2e tests that covers the feature
  • Issues in the epic will be created based on the document

mfranczy avatar Sep 12 '22 13:09 mfranczy