cluster-api CAPI controller should taint outdated nodes with PreferNoSchedule

User Story

As a operator of the cluster API, when I make changes to an existing MachineDeployment or MachineSet infrastructure template, any existing nodes previously managed are reconciled, replaced and then drained by the cluster api controllers.

If the MachineSet or MachineDeployment has many replicas, and each node has many pods, this can result in unnecessary pod churn. As the first node is drained, pods previously running on that node may be scheduled onto nodes who have yet to be replaced, but will be torn down soon. When the cluster api controller finally drains those nodes, those pods are evicted again, and are rescheduled. In sufficiently deep clusters, this may result in workloads being evicted and restarted unnecessarily on doomed nodes many times.

I would like the cluster api controller to taint all of the nodes it will be replacing with PreferNoSchedule so that pods prefer scheduling on newer nodes, and only fall back to scheduling on older outdated nodes if the cluster has no alternative capacity.

Detailed Description

The cluster api controller should taint all of the nodes in outdated machine deployment or machine sets with PreferNoSchedule.

Anything else you would like to add:

N/A

/kind feature

Aug 09 '22 23:08 cnmcavoy

Sounds like a good idea to me. We used something like this before in another project and it also generally speeds up upgrades when PDBs are used as less Pod drains are required.

Aug 16 '22 12:08 sbueringer

/triage accepted

Aug 26 '22 12:08 fabriziopandini

I'm +1, TBD if to implement this as a default behaviour or behind some feature flag/annotation @enxebre @vincepri @CecileRobertMichon opinions?

Aug 26 '22 12:08 fabriziopandini

+1 to soft taint feature during MachineDeployment rolling upgrades. I'd expect tainting to be driven by the MachineDeployment controller as it reconciles old MachineSets. I'm +1 to implement as default behaviour while covered by unit, e2e, conformance testing.

Aug 29 '22 10:08 enxebre

Hi, If there's no assignee I can handle it based on the idea described in https://github.com/kubernetes-sigs/cluster-api/issues/7043#issue-1333875124. Could I working on it (would you allow me to do /lifecycle active?)

Nov 04 '22 12:11 hiromu-a5a

Hi @hiromu-a5a, feel free to comment /assign if you'd like to work on it

/lifecycle active

Nov 04 '22 17:11 CecileRobertMichon

/assign

Nov 07 '22 06:11 hiromu-a5a

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Feb 05 '23 06:02 k8s-triage-robot

/remove-lifecycle stale

Feb 06 '23 07:02 sbueringer

/lifecycle frozen

Jul 10 '23 15:07 vincepri

/lifecycle active

Jul 10 '23 15:07 vincepri

cluster-api cluster-api copied to clipboard

CAPI controller should taint outdated nodes with PreferNoSchedule

cluster-api
cluster-api copied to clipboard