karpenter-provider-aws
karpenter-provider-aws copied to clipboard
Spot instances are not being consolidated.
Description
There are 2 On-Demand instances for placing Karpenter's Pods and 3 Spot instance nodes for scheduling workload Pods. I believe the utilization is low, but why aren't they being replaced with smaller instance types?
Spot1.
kubectl describe node ip-10-219-212-23.ap-northeast-1.compute.internal | grep -A 30 "Events:"
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Starting 28m kube-proxy
Normal NodeAllocatableEnforced 28m kubelet Updated Node Allocatable limit across pods
Normal Starting 28m kubelet Starting kubelet.
Warning InvalidDiskCapacity 28m kubelet invalid capacity 0 on image filesystem
Normal NodeHasSufficientMemory 28m (x2 over 28m) kubelet Node ip-10-219-212-23.ap-northeast-1.compute.internal status is now: NodeHasSufficientMemory
Normal NodeHasNoDiskPressure 28m (x2 over 28m) kubelet Node ip-10-219-212-23.ap-northeast-1.compute.internal status is now: NodeHasNoDiskPressure
Normal NodeHasSufficientPID 28m (x2 over 28m) kubelet Node ip-10-219-212-23.ap-northeast-1.compute.internal status is now: NodeHasSufficientPID
Normal Synced 28m cloud-node-controller Node synced successfully
Normal RegisteredNode 28m node-controller Node ip-10-219-212-23.ap-northeast-1.compute.internal event: Registered Node ip-10-219-212-23.ap-northeast-1.compute.internal in Controller
Normal NodeReady 28m kubelet Node ip-10-219-212-23.ap-northeast-1.compute.internal status is now: NodeReady
Normal DisruptionBlocked 27m karpenter Cannot disrupt Node: Nominated for a pending pod
Normal DisruptionBlocked 23m karpenter Cannot disrupt Node: PDB "cattle-gatekeeper-system/gatekeeper-controller-manager" prevents pod evictions
Normal DisruptionBlocked 21m (x2 over 25m) karpenter Cannot disrupt Node: PDB "tempo/tempo-distributed-distributor" prevents pod evictions
Normal DisruptionBlocked 19m karpenter Cannot disrupt Node: PDB "kube-system/coredns" prevents pod evictions
Normal Unconsolidatable 53s (x2 over 16m) karpenter Can't replace with a cheaper node
Question : What does "Can't replace with a cheaper node" mean? I don't understand the specific reasons why consolidation is not possible.
Spot2.
kubectl describe node ip-10-219-208-22.ap-northeast-1.compute.internal | grep -A 30 "Events:"
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Starting 25m kube-proxy
Normal Starting 26m kubelet Starting kubelet.
Warning InvalidDiskCapacity 26m kubelet invalid capacity 0 on image filesystem
Normal NodeHasSufficientPID 26m (x2 over 26m) kubelet Node ip-10-219-208-22.ap-northeast-1.compute.internal status is now: NodeHasSufficientPID
Normal NodeHasSufficientMemory 26m (x2 over 26m) kubelet Node ip-10-219-208-22.ap-northeast-1.compute.internal status is now: NodeHasSufficientMemory
Normal NodeHasNoDiskPressure 26m (x2 over 26m) kubelet Node ip-10-219-208-22.ap-northeast-1.compute.internal status is now: NodeHasNoDiskPressure
Normal NodeAllocatableEnforced 26m kubelet Updated Node Allocatable limit across pods
Normal RegisteredNode 25m node-controller Node ip-10-219-208-22.ap-northeast-1.compute.internal event: Registered Node ip-10-219-208-22.ap-northeast-1.compute.internal in Controller
Normal Synced 25m cloud-node-controller Node synced successfully
Normal NodeReady 25m kubelet Node ip-10-219-208-22.ap-northeast-1.compute.internal status is now: NodeReady
Normal SpotRebalanceRecommendation 22m karpenter Spot rebalance recommendation was triggered
Normal DisruptionBlocked 19m (x4 over 25m) karpenter Cannot disrupt Node: Nominated for a pending pod
Normal Unconsolidatable 2m20s (x2 over 18m) karpenter Can't remove without creating 2 candidates
Question : What does "Can't remove without creating 2 candidates" mean? I don't understand the specific reasons why consolidation is not possible.
kubectl describe node ip-10-219-210-8.ap-northeast-1.compute.internal | grep -A 30 "Events:"
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Starting 24m kube-proxy
Normal NodeHasSufficientPID 24m (x2 over 24m) kubelet Node ip-10-219-210-8.ap-northeast-1.compute.internal status is now: NodeHasSufficientPID
Normal Starting 24m kubelet Starting kubelet.
Warning InvalidDiskCapacity 24m kubelet invalid capacity 0 on image filesystem
Normal NodeHasSufficientMemory 24m (x2 over 24m) kubelet Node ip-10-219-210-8.ap-northeast-1.compute.internal status is now: NodeHasSufficientMemory
Normal NodeHasNoDiskPressure 24m (x2 over 24m) kubelet Node ip-10-219-210-8.ap-northeast-1.compute.internal status is now: NodeHasNoDiskPressure
Normal NodeAllocatableEnforced 24m kubelet Updated Node Allocatable limit across pods
Normal Synced 24m cloud-node-controller Node synced successfully
Normal RegisteredNode 24m node-controller Node ip-10-219-210-8.ap-northeast-1.compute.internal event: Registered Node ip-10-219-210-8.ap-northeast-1.compute.internal in Controller
Normal NodeReady 23m kubelet Node ip-10-219-210-8.ap-northeast-1.compute.internal status is now: NodeReady
Normal DisruptionBlocked 23m karpenter Cannot disrupt Node: Nominated for a pending pod
Normal DisruptionBlocked 21m karpenter Cannot disrupt Node: PDB "istio-system/istiod" prevents pod evictions
Normal Unconsolidatable 2m57s (x2 over 22m) karpenter SpotToSpotConsolidation requires 15 cheaper instance type options than the current candidate to consolidate, got 5
Question : I understand that there are only five cheaper Spot instances available, but since we are specifying at least large and xlarge instance sizes, I believe there should be more than five candidates. Is there an issue with the way we are specifying the NodePool?
Versions:
- Chart Version: 0.37
- Kubernetes Version (
kubectl version): 1.28
values.yaml
replicas: 2 # default
serviceAccount:
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::xxxxxxxxxx:role/aws-stg-karpenter
settings:
clusterName: "aws-stg"
interruptionQueue: "aws-stg"
resources:
requests:
cpu: 1
memory: 1Gi
limits:
cpu: 1
memory: 1Gi
featureGates:
spotToSpotConsolidation: true
logLevel: debug
tolerations:
- key: CriticalAddonsOnly
operator: Exists
install commands
helm upgrade --install karpenter -f values.yaml oci://public.ecr.aws/karpenter/karpenter --version "0.37.0"
nodepool.yaml
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
name: default
spec:
template:
metadata:
labels:
karpenter-nodepool: default
spec:
requirements:
- key: kubernetes.io/arch
operator: In
values: ["amd64"]
- key: kubernetes.io/os
operator: In
values: ["linux"]
- key: karpenter.sh/capacity-type
operator: In
values: ["spot"]
- key: karpenter.k8s.aws/instance-category
operator: In
values: ["c", "m", "r"]
- key: "karpenter.k8s.aws/instance-family"
operator: In
values: ["m6i", "m6a", "m6id", "m6in", "m6idn", "m7a", "m7i", "c6a", "c6i", "c6id", "c6in", "c7a", "c7i", "r6i", "r6a", "r6id", "r6in", "r6idn", "r7a", "r7i"]
- key: karpenter.k8s.aws/instance-generation
operator: Gt
values: ["2"]
- key: karpenter.k8s.aws/instance-size
operator: In
values: ["large", "xlarge", "2xlarge", "4xlarge", "8xlarge"]
- key: karpenter.k8s.aws/instance-cpu
operator: Gt
values: ["4"]
nodeClassRef:
apiVersion: karpenter.k8s.aws/v1beta1
kind: EC2NodeClass
name: default
kubelet:
maxPods: 110
limits:
cpu: "160"
memory: 640Gi
disruption:
consolidationPolicy: WhenUnderutilized
---
apiVersion: karpenter.k8s.aws/v1beta1
kind: EC2NodeClass
metadata:
name: default
spec:
amiFamily: AL2 # Amazon Linux 2
role: "KarpenterNodeRole-aws-stg"
subnetSelectorTerms:
- tags:
karpenter.sh/discovery: aws-stg
securityGroupSelectorTerms:
- tags:
aws:eks:cluster-name: aws-stg
tags:
eks:nodegroup-name: "karpenter-spot-instances"
metadataOptions:
httpEndpoint: enabled
httpProtocolIPv6: disabled
httpPutResponseHopLimit: 2
httpTokens: optional
blockDeviceMappings:
- deviceName: /dev/xvda
ebs:
volumeSize: 100Gi
volumeType: gp3
iops: 3000
throughput: 125
deleteOnTermination: true
detailedMonitoring: true
What does "Can't remove without creating 2 candidates" mean?
Karpenter will not consolidate one node into two nodes. This is meant as a safeguard against node launch failures when replacing nodes, and ensuring that we only consolidate when we know it's safe to. This also generally saves us from doing too many consolidation actions, and prioritizing larger nodes.
Normal Unconsolidatable 2m57s (x2 over 22m) karpenter SpotToSpotConsolidation requires 15 cheaper instance type options than the current candidate to consolidate, got 5
This is a requirement of spot to spot consolidation. Since spot instances have a tradeoff of availability for cost, if you always accept a consolidation from a spot instance to a cheaper one, you'll have continual consolidations until you get the cheapest and smallest node per pod, we call this race to the bottom. If you have more questions on how this works, feel free to read the design: https://github.com/kubernetes-sigs/karpenter/blob/main/designs/spot-consolidation.md
@njtran
What does "Can't remove without creating 2 candidates" mean?
Karpenter will not consolidate one node into two nodes. This is meant as a safeguard against node launch failures when replacing nodes, and ensuring that we only consolidate when we know it's safe to. This also generally saves us from doing too many consolidation actions, and prioritizing larger nodes.
Normal Unconsolidatable 2m57s (x2 over 22m) karpenter SpotToSpotConsolidation requires 15 cheaper instance type options than the current candidate to consolidate, got 5
This is a requirement of spot to spot consolidation. Since spot instances have a tradeoff of availability for cost, if you always accept a consolidation from a spot instance to a cheaper one, you'll have continual consolidations until you get the cheapest and smallest node per pod, we call this race to the bottom. If you have more questions on how this works, feel free to read the design: https://github.com/kubernetes-sigs/karpenter/blob/main/designs/spot-consolidation.md
Thank you for your response!
I'm sorry, but I think I might not fully understand. What does "Karpenter will not consolidate one node into two nodes" mean?
Usually, if there is a lot of free capacity, I would expect it to replace with a cheaper node. I understand that there are cases where this might fail, but I don't understand the meaning of preventing it as a safety measure only in this case.
Does this message indicate that instead of shutting down a node, it will start two new nodes and distribute the workload's Pods across these two new nodes? In other words, if the instance type is reduced, the current node is using so many resources that it cannot fit into a single node, so it decides to consolidate into two nodes, and this action is being suppressed because it is risky? (In summary, does it mean that it simply cannot scale in any further?)
Normal Unconsolidatable 2m57s (x2 over 22m) karpenter SpotToSpotConsolidation requires 15 cheaper instance type options than the current candidate to consolidate, got 5
It turned out that the issue was due to a misconfiguration of my NodePool. Since all conditions were set with AND logic, it unnecessarily narrowed down the target instance types.
What does "Can't replace with a cheaper node" mean?
Do you have any comments on this?
This issue has been inactive for 14 days. StaleBot will close this stale issue after 14 more days of inactivity.