MachineSet with wrong MachineSetNameLabel value in .spec.selector spawns infinite Machines
What steps did you take and what happened?
Scaling up a CAPI MachineSet with a .spec.selector.matchLabels that has a k/v clusterv1.MachineSetNameLabel (cluster.x-k8s.io/set-name) with value that doesn't match the .metadata.name of the MachineSet, causes a infinite-loop of child Machines being created.
Reproducing steps:
- a CAPI MachineSet is created with matching
.spec.selector.matchLabelsand.spec.template.metadata.labels'scluster.x-k8s.io/set-namebut with its.metadata.namediffering from them - the CAPI MachineSet is scaled up to 1 replica by the user
- the CAPI MachineSet controller goes to create a machine, it sets the
.metadata.labelsvalue for the keyclusterv1.MachineSetNameLabelin the Machine by setting it to the parent MachineSet.metadata.name, ignoring what was specified in the.spec.template.metadata.labelsof the parent MachineSet - the CAPI Machine that gets created from the scale up has
.metadata.labelsdiffering from the one its parent MachineSet uses to select/find its children in the.spec.selector.matchingLabels - the CAPI machineset controller as such is unable to keep track if its children, and keeps creating new ones as the diff between owned machines and spec.replicas is always < 0
What did you expect to happen?
Only one Machine being created.
Cluster API version
Reproduced in v1.9.z, might be in all supported versions.
Kubernetes version
v1.32
Anything else you would like to add?
No response
Label(s) to be applied
/kind bug
/area machineset /area machine
The MachineSet webhook defaults an empty selector to include the MachineSetNameLabel with the m.Name value, but there is currently no validation that the value of that label matches m.Name
I think the fix for this issue is just to check that any label that the MachineSet would override to a specific value when creating Machines, is only allowed to be set to that specific value in the selector.
@chrischdi @sbueringer who should I tag for triaging this? Thanks!
The MachineSet webhook defaults an empty selector to include the
MachineSetNameLabelwith them.Namevalue, but there is currently no validation that the value of that label matchesm.NameI think the fix for this issue is just to check that any label that the MachineSet would override to a specific value when creating Machines, is only allowed to be set to that specific value in the selector.
That sounds reasonable.
We should then also add a corresponding validation for the MachineDeployments. As addition: we should propably also check the .metadata.labels on the machines are properly enforced (for MD, MS, KCP machines)
/priority important-longterm /triage accepted
Wondering if we should move a step further and do not expose selectors in the API 🤔
Wondering if we should move a step further and do not expose selectors in the API 🤔
Would that mean relying solely on owner references to find the Machines owned by the MachineSet?
/help
@chrischdi: This request has been marked as needing help from a contributor.
Guidelines
Please ensure that the issue body includes answers to the following questions:
- Why are we solving this issue?
- To address this issue, are there any code changes? If there are code changes, what needs to be done in the code and what places can the assignee treat as reference points?
- Does this issue have zero to low barrier of entry?
- How can the assignee reach out to you for help?
For more details on the requirements of such an issue, please see here and ensure that they are met.
If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help command.
In response to this:
/help
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.
Wondering if we should move a step further and do not expose selectors in the API 🤔
Would that mean relying solely on owner references to find the Machines owned by the MachineSet?
I was thinking about only relying on the labels that we always add on the Machines
We have similar problem. If we create MachineDeployment with empty selector, and mutating webhook does not work (for example, someone deleted mutatingwebhookconfigurations.admissionregistration.k8s.io), then CAPI starts to create infinite number of MachineSet.
apiVersion: cluster.x-k8s.io/v1beta1
kind: MachineDeployment
metadata:
namespace: d8-cloud-instance-manager
name: "bob-worker"
spec:
clusterName: "bob"
selector: {}
minReadySeconds: 0
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 1
strategy:
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
type: RollingUpdate
template:
metadata:
labels:
node-group: "worker"
spec:
clusterName: "bob"
bootstrap:
dataSecretName: "worker-99612327"
infrastructureRef:
apiVersion: "infrastructure.cluster.x-k8s.io/v1beta2"
kind: "VCDMachineTemplate"
name: worker-99612327
nodeDrainTimeout: 10m
nodeDeletionTimeout: 10m
nodeVolumeDetachTimeout: 10m
and mutating webhook does not work (for example, someone deleted mutatingwebhookconfigurations.admissionregistration.k8s.io)
This is something that we do not support