autoscaler
autoscaler copied to clipboard
Cluster autoscaler need to respect topologySpreadConstraints
Which component are you using?:
cluster-autoscaler
What version of the component are you using?:
Component version: v1.25.0
What k8s version are you using (kubectl version
)?:
kubectl version
Output
$ kubectl version ```console ➜ ~ kubectl version Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.0", GitCommit:"ab69524f795c42094a6630298ff53f3c3ebab7f4", GitTreeState:"clean", BuildDate:"2021-12-07T18:16:20Z", GoVersion:"go1.17.3", Compiler:"gc", Platform:"darwin/amd64"} Server Version: version.Info{Major:"1", Minor:"24+", GitVersion:"v1.24.8-eks-ffeb93d", GitCommit:"abb98ec0631dfe573ec5eae40dc48fd8f2017424", GitTreeState:"clean", BuildDate:"2022-11-29T18:45:03Z", GoVersion:"go1.18.8", Compiler:"gc", Platform:"linux/amd64"} ```
What environment is this in?:
EKS
What did you expect to happen?:
We've set topology spread constrains on our workload, and we expect nodes to be created across multiple AZs.
What happened instead?:
Nodes are being created under a single AZ.
You can see from snapshot above that workloads are all scheduled to nodegroup
m6g-2xlarge-tidbcloud-system-eks-us-west-2-4b35b408-us-west-2c
. No nodes are created under m6g-2xlarge-tidbcloud-system-eks-us-west-2-4b35b408-us-west-2a
and m6g-2xlarge-tidbcloud-system-eks-us-west-2-4b35b408-us-west-2b
.
How to reproduce it (as minimally and precisely as possible):
Create three nodegroups that spans three AZs, with minimum node count set to 0, desired node count set to 0 and maximum node count set to 400. Create a workload with following topology spread constrain:
topologySpreadConstraints:
- labelSelector:
matchLabels:
app.kubernetes.io/component: tikv
app.kubernetes.io/instance: db
app.kubernetes.io/managed-by: tidb-operator
app.kubernetes.io/name: tidb-cluster
maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
And what we found is workloads are being created under a single AZ.
Anything else we need to know?:
We suspect that it's related to the fact that we're scaling all node groups from zero.
You need to specify the label topology.kubernetes.io/zone as a tag for your vmss "k8s.io_cluster-autoscaler_node-template_label_topology.kubernetes.io_zone"
I think we are seeing this as well on EKS.
The screenshot above is also from AWS, so we are not using vmss, but ASG instead.
During scheduling, I see this:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 7m17s default-scheduler 0/4 nodes are available: 4 node(s) didn't satisfy existing pods anti-affinity rules.
Normal NotTriggerScaleUp 7m15s cluster-autoscaler pod didn't trigger scale-up: 1 node(s) didn't satisfy existing pods anti-affinity rules
Normal Scheduled 7m1s default-scheduler Successfully assigned namespace/pod-6cc7699fdd-dxfxs to ip-1-1-1-1.region.compute.internal
I have been scratching my head as to why topologySpreadConstraints isn't working on our cluster. Is this the reason?
You currently have 0 nodes in some of your node groups, which means that the autoscaler has no idea how many zones exist in your cluster. In your case, it only sees 1 zone and keeps spinning up all nodes in a single zone to satisfy the constraint.
Until EKS supports minDomains
, you should ensure that at least 1 node is running in each zone so that the autoscaler is aware of the number of zones and can scale properly.
minDomains
field is enabled by default in 1.27
but CA (tested with cluster-autoscaler:v1.27.2
) does not spin up a new node.
Spread constraint definition:
topologySpreadConstraints:
- maxSkew: 1
minDomains: 2
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app: nginx
CA logs:
I0607 19:39:51.897194 1 static_autoscaler.go:289] Starting main loop
I0607 19:39:51.897545 1 aws_manager.go:185] Found multiple availability zones for ASG "eks-managed_ng_with_lt-62c44bae-1bd0-05aa-d82d-7f4643c608a3"; using eu-central-1a for failure-domain.beta.kubernetes.io/zone label
I0607 19:39:51.898019 1 aws_manager.go:185] Found multiple availability zones for ASG "eks-managed_ng_with_lt-62c44bae-1bd0-05aa-d82d-7f4643c608a3"; using eu-central-1a for failure-domain.beta.kubernetes.io/zone label
I0607 19:39:51.898205 1 filter_out_schedulable.go:63] Filtering out schedulables
I0607 19:39:51.898311 1 klogx.go:87] failed to find place for smash/nginx-deployment-66fb68bbf8-pjsxp: cannot put pod nginx-deployment-66fb68bbf8-pjsxp on any node
I0607 19:39:51.898322 1 filter_out_schedulable.go:120] 0 pods marked as unschedulable can be scheduled.
I0607 19:39:51.898333 1 filter_out_schedulable.go:83] No schedulable pods
I0607 19:39:51.898338 1 filter_out_daemon_sets.go:40] Filtering out daemon set pods
I0607 19:39:51.898343 1 filter_out_daemon_sets.go:49] Filtered out 0 daemon set pods, 1 unschedulable pods left
I0607 19:39:51.898355 1 klogx.go:87] Pod smash/nginx-deployment-66fb68bbf8-pjsxp is unschedulable
I0607 19:39:51.898370 1 orchestrator.go:109] Upcoming 0 nodes
I0607 19:39:51.898473 1 orchestrator.go:466] Pod nginx-deployment-66fb68bbf8-pjsxp can't be scheduled on eks-managed_ng_with_lt-62c44bae-1bd0-05aa-d82d-7f4643c608a3, predicate checking error: node(s) didn't match pod topology spread constraints; predicateName=PodTopologySpread; reasons: node(s) didn't match pod topology spread constraints; debugInfo=
I0607 19:39:51.898489 1 orchestrator.go:167] No pod can fit to eks-managed_ng_with_lt-62c44bae-1bd0-05aa-d82d-7f4643c608a3
I0607 19:39:51.898500 1 orchestrator.go:172] No expansion options
I0607 19:39:51.898532 1 static_autoscaler.go:575] Calculating unneeded nodes
I0607 19:39:51.898571 1 eligibility.go:154] Node ip-10-0-1-17.eu-central-1.compute.internal is not suitable for removal - memory utilization too big (0.582941)
I0607 19:39:51.898607 1 static_autoscaler.go:623] Scale down status: lastScaleUpTime=2023-06-07 18:39:31.714030436 +0000 UTC m=-3576.680943426 lastScaleDownDeleteTime=2023-06-07 18:39:31.714030436 +0000 UTC m=-3576.680943426 lastScaleDownFailTime=2023-06-07 18:39:31.714030436 +0000 UTC m=-3576.680943426 scaleDownForbidden=false scaleDownInCooldown=false
I0607 19:39:51.898638 1 static_autoscaler.go:632] Starting scale down
I0607 19:39:51.898657 1 legacy.go:296] No candidates for scale down
The ASG currently has only one node in eu-central-1a
AZ but can add instances in three AZs (eu-central-1a
, eu-central-1c
and eu-central-1b
) and "Capacity rebalance" is also enabled for the ASG.
Pretty sure you have to create a managed nodegroup/ASG per zone. Don't assign multiple zones to an ASG. It is documented in the autoscaler FAQ.
It works with one nodegroup per zone.
Running into the same issue with AKS. But like the other said it's probably because I use multiple zones (3) within one node pool. Too bad this doesn't work, but it's actually logical as the AS can't guarantee that a node from a specific zone will be spun up.
https://github.com/kubernetes/autoscaler/blob/cluster-autoscaler-1.27.1/cluster-autoscaler/FAQ.md#how-can-i-scale-a-node-group-to-0
You might need to attach a tag/label to your node pool as instructed ^
I have the same issues but I'm not using the zone
as topology key but the hostname
of the node
topologySpreadConstraints:
- labelSelector:
matchLabels:
app.kubernetes.io/instance: nginx
maxSkew: 1
minDomains: 2
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: DoNotSchedule
I have the same issues but I'm not using the
zone
as topology key but thehostname
of the nodetopologySpreadConstraints: - labelSelector: matchLabels: app.kubernetes.io/instance: nginx maxSkew: 1 minDomains: 2 topologyKey: kubernetes.io/hostname whenUnsatisfiable: DoNotSchedule
@rasta-rocket
I believe you should changes your maxSkew
to more than 1, if using kubernetes.io/hostname
in topologyKey
mean that each node required to have pods with match label maximum maxSkew
configuration. lets say maxSkew
3, so each node will have the pods maximum 3 pods, otherwise not scheduled / trigger autoscaler.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale
- Close this issue with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle rotten
- Close this issue with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Reopen this issue with
/reopen
- Mark this issue as fresh with
/remove-lifecycle rotten
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
In response to this:
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied- After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied- After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closedYou can:
- Reopen this issue with
/reopen
- Mark this issue as fresh with
/remove-lifecycle rotten
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.