cluster-api icon indicating copy to clipboard operation
cluster-api copied to clipboard

CAPI controller should taint outdated nodes with PreferNoSchedule

Open cnmcavoy opened this issue 2 years ago • 2 comments

User Story

As a operator of the cluster API, when I make changes to an existing MachineDeployment or MachineSet infrastructure template, any existing nodes previously managed are reconciled, replaced and then drained by the cluster api controllers.

If the MachineSet or MachineDeployment has many replicas, and each node has many pods, this can result in unnecessary pod churn. As the first node is drained, pods previously running on that node may be scheduled onto nodes who have yet to be replaced, but will be torn down soon. When the cluster api controller finally drains those nodes, those pods are evicted again, and are rescheduled. In sufficiently deep clusters, this may result in workloads being evicted and restarted unnecessarily on doomed nodes many times.

I would like the cluster api controller to taint all of the nodes it will be replacing with PreferNoSchedule so that pods prefer scheduling on newer nodes, and only fall back to scheduling on older outdated nodes if the cluster has no alternative capacity.

Detailed Description

The cluster api controller should taint all of the nodes in outdated machine deployment or machine sets with PreferNoSchedule.

Anything else you would like to add:

N/A

/kind feature

cnmcavoy avatar Aug 09 '22 23:08 cnmcavoy

Sounds like a good idea to me. We used something like this before in another project and it also generally speeds up upgrades when PDBs are used as less Pod drains are required.

sbueringer avatar Aug 16 '22 12:08 sbueringer

/triage accepted

fabriziopandini avatar Aug 26 '22 12:08 fabriziopandini

I'm +1, TBD if to implement this as a default behaviour or behind some feature flag/annotation @enxebre @vincepri @CecileRobertMichon opinions?

fabriziopandini avatar Aug 26 '22 12:08 fabriziopandini

+1 to soft taint feature during MachineDeployment rolling upgrades. I'd expect tainting to be driven by the MachineDeployment controller as it reconciles old MachineSets. I'm +1 to implement as default behaviour while covered by unit, e2e, conformance testing.

enxebre avatar Aug 29 '22 10:08 enxebre

Hi, If there's no assignee I can handle it based on the idea described in https://github.com/kubernetes-sigs/cluster-api/issues/7043#issue-1333875124. Could I working on it (would you allow me to do /lifecycle active?)

hiromu-a5a avatar Nov 04 '22 12:11 hiromu-a5a

Hi @hiromu-a5a, feel free to comment /assign if you'd like to work on it

/lifecycle active

CecileRobertMichon avatar Nov 04 '22 17:11 CecileRobertMichon

/assign

hiromu-a5a avatar Nov 07 '22 06:11 hiromu-a5a

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Feb 05 '23 06:02 k8s-triage-robot

/remove-lifecycle stale

sbueringer avatar Feb 06 '23 07:02 sbueringer

/lifecycle frozen

vincepri avatar Jul 10 '23 15:07 vincepri

/lifecycle active

vincepri avatar Jul 10 '23 15:07 vincepri