autoscaler
autoscaler copied to clipboard
ClusterAutoscaler does not work with mixed instances weight
Which component are you using?: cluster-autoscaler
What version of the component are you using?: 1.23.0
What k8s version are you using (kubectl version
)?: 1.21.5
What environment is this in?: AWS
We have EKS ASG with mixed instances and different weights
asg = {
general = {
instance_type = "c5.2xlarge"
desired_capacity = null
max_size = 10
min_size = 2
on_demand_base_capacity = 2
spot_instances = true
mixed_instances = [
{
type = "c5.2xlarge"
weight = 1
},
{
type = "c5a.2xlarge"
weight = 1
},
{
type = "c5.4xlarge"
weight = 2
},
{
type = "c5a.4xlarge"
weight = 2
},
]
labels = {
"node.mycompany.com/local-gpu" = false
"node.mycompany.com/local-nvme" = false
"node.mycompany.com/nodegroup" = "general"
}
}
}
What did you expect to happen?:
CA would launch the necessary nodes.
What happened instead?:
CA did not launch additional nodes because of instance weight (1 instance with weight of two, plus two instances with weight of one, are reported as 4, so AutoScaler ignores scale up)
ScaleUp: NoActivity (ready=3 cloudProviderTarget=4)
LastProbeTime: 2022-05-04 10:28:04.758669894 +0000 UTC m=+134.153075785
LastTransitionTime: 2022-05-04 10:26:24.013344295 +0000 UTC m=+33.407750119
Log:
I0504 10:13:15.501163 1 aws_manager.go:269] Refreshed ASG list, next refresh after 2022-05-04 10:14:15.501159899 +0000 UTC m=+3798493.211866322
I0504 10:13:15.501924 1 static_autoscaler.go:319] 1 unregistered nodes present
I0504 10:13:15.502103 1 filter_out_schedulable.go:65] Filtering out schedulables
I0504 10:13:15.502122 1 filter_out_schedulable.go:132] Filtered out 0 pods using hints
I0504 10:13:15.502169 1 filter_out_schedulable.go:157] Pod logstash-logging.logstash-logging-8 marked as unschedulable can be scheduled on node template-node-for-eks-general01-ops-stg-us-east-1-<MY_ASG>-upcoming-0. Ignoring in scale up.
I0504 10:13:15.502188 1 filter_out_schedulable.go:170] 0 pods were kept as unschedulable based on caching
I0504 10:13:15.502199 1 filter_out_schedulable.go:171] 1 pods marked as unschedulable can be scheduled.
I0504 10:13:15.502210 1 filter_out_schedulable.go:79] Schedulable pods present
I0504 10:13:15.502233 1 static_autoscaler.go:401] No unschedulable pods
I0504 10:13:15.502251 1 static_autoscaler.go:448] Calculating unneeded nodes
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 26s (x2 over 28s) default-scheduler 0/3 nodes are available: 3 Insufficient memory.
Normal Synced 16s (x3 over 28s) ops-stg-aws-us-east-1-eks-general01 Pod synced successfully
Normal Synced 14s (x3 over 28s) ops-stg-aws-us-east-1-eks-general01 Pod synced successfully
This is as-designed. Cluster Autoscaler expects that every node in a node group (an ASG in this case) is identical, so using different weights (or, more fundamentally, different instance types) in a single ASG will not work.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale
- Mark this issue or PR as rotten with
/lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Reopen this issue with
/reopen
- Mark this issue as fresh with
/remove-lifecycle rotten
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
In response to this:
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied- After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied- After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closedYou can:
- Reopen this issue with
/reopen
- Mark this issue as fresh with
/remove-lifecycle rotten
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.