cluster-api icon indicating copy to clipboard operation
cluster-api copied to clipboard

Support defining deletion priority for Machine

Open haijianyang opened this issue 6 months ago • 2 comments

What would you like to be added (User Story)?

As a developer I would like to Support defining deletion priority for Machine.

Detailed Description

For high availability, we deploy the k8s cluster in a Stretched Cluster, which means that the machines managed by KCP/MD will be distributed in two different data centers (Preferred Fault Domain and Secondary Fault Domain).

When the k8s cluster is rolling updated or scaled down, according to the characteristics of stretch cluster, we want the machines in the secondary fault domain be deleted first.

However, the current capabilities provided by KCP/MD in scaling down cannot meet the above requirements.

Is it possible to add a machine scoring mechanism to KCP/MD, so that the infra provider can set a deletion score for each machine based on its own situation, and KCP/MD can select the machines to be deleted based on the deletion score when scaling down.

For example, add cluster.x-k8s.io/delete-priority annotation to machine.

The delete-priority value is similar to the existing:

const (
	mustDelete    deletePriority = 100.0
	shouldDelete  deletePriority = 75.0
	betterDelete  deletePriority = 50.0
	couldDelete   deletePriority = 20.0
	mustNotDelete deletePriority = 0.0
)

Anything else you would like to add?

No response

Label(s) to be applied

/kind feature

haijianyang avatar May 15 '25 06:05 haijianyang

This issue is currently awaiting triage.

If CAPI contributors determine this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot avatar May 15 '25 06:05 k8s-ci-robot

Question: Are you using failure domains in your implementation?

chrischdi avatar May 28 '25 14:05 chrischdi

No response within 1-2 months so

/close

sbueringer avatar Jul 25 '25 14:07 sbueringer

@sbueringer: Closing this issue.

In response to this:

No response within 1-2 months so

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot avatar Jul 25 '25 14:07 k8s-ci-robot