bug(kubelet): maxPodsExpression not applied everywhere
What happened:
I tried to set maxPodsExpression to default_enis * (ips_per_eni - 1) and it reflects correctly in /etc/kubernetes/kubelet/config.json (the instance is a m7g.xlarge, so 4 ENIs with 15 IPs per ENI -> 60 IPs max):
[ec2-user@ip-10-100-155-89 ~]$ cat /etc/kubernetes/kubelet/config.json | jq .maxPods
56
However it's not reflected in /etc/kubernetes/kubelet/config.json.d/40-nodeadm.conf
[ec2-user@ip-10-100-155-89 ~]$ cat /etc/kubernetes/kubelet/config.json.d/40-nodeadm.conf | jq .maxPods
58
What you expected to happen:
[ec2-user@ip-10-100-155-89 ~]$ cat /etc/kubernetes/kubelet/config.json.d/40-nodeadm.conf | jq .maxPods
56
How to reproduce it (as minimally and precisely as possible):
Environment:
- AWS Region: eu-central-1
- Instance Type(s): m7g.xlarge
- Cluster Kubernetes version: 1.34
- Node Kubernetes version:
- AMI Version: al2023@latest
@sylr are you using Karpenter or a similar project for launching and configuring your nodes?
this behavior is outlined in the docs https://github.com/awslabs/amazon-eks-ami/blob/main/nodeadm/doc/examples.md#defining-a-max-pods-expression, with the rationale being that users would only set explicit maxPods values if they want them applied over the maxPodsExpression. IIRC Karpenter always sets maxPods, which could explain the behavior you are seeing if you are not setting it yourself
@mselim00 I am using karpenter, here the EC2NodeClass I'm using:
---
apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
name: al2023-default
spec:
# Required, resolves a default ami and userdata
amiFamily: AL2023
amiSelectorTerms:
- alias: al2023@latest
# Required, discovers subnets to attach to instances
subnetSelectorTerms:
- tags:
"kubernetes.io/cluster/eks-acme-staging-euc1-01": "owned"
"kubernetes.io/role/internal-elb": "1"
# Required, discovers security groups to attach to instances
securityGroupSelectorTerms:
- id: sg-07ec9d6eedf144b40 # eks-nodes-acme-staging-euc1-01
- id: sg-0364e80630e42138e # eks-cluster-sg-eks-acme-staging-euc1-01-387902043
instanceProfile: ProfileName
# Optional, overrides autogenerated userdata with a merge semantic
userData: |
Content-Type: multipart/mixed; boundary="MIMEBOUNDARY"
MIME-Version: 1.0
--MIMEBOUNDARY
Content-Transfer-Encoding: 7bit
Content-Type: text/cloud-config
Mime-Version: 1.0
#cloud-init
write_files:
- path: /home/ec2-user/.ssh/authorized_keys
permission permissions: '0600'
owner: ec2-user:ec2-user
append: true
content: |
ssh-ed25519 xxxxxxxxxxx
--MIMEBOUNDARY
Content-Transfer-Encoding: 7bit
Content-Type: application/node.eks.aws
Mime-Version: 1.0
---
apiVersion: node.eks.aws/v1alpha1
kind: NodeConfig
spec:
cluster:
cidr: 172.20.0.0/16
name: eks-acme-staging-euc1-01
kubelet:
maxPodsExpression: "default_enis * (ips_per_eni - 1)"
config:
clusterDNS:
- 172.20.0.10
--MIMEBOUNDARY--
# Optional, propagates tags to underlying EC2 resources
tags:
company.com/scope: kubernetes
company.com/owner: [email protected]
# Optional, configures IMDS for the instance
metadataOptions:
httpEndpoint: enabled
httpProtocolIPv6: disabled
httpPutResponseHopLimit: 1
httpTokens: required
# Optional, configures storage devices for the instance
blockDeviceMappings:
# Root device
- deviceName: /dev/xvda
rootVolume: true
ebs:
volumeSize: 25Gi
volumeType: gp3
encrypted: true
# Optional, configures detailed monitoring for the instance
detailedMonitoring: true
Here is the instance userdata and we can see the hardcoded maxPods:
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="//"
--//
Content-Type: text/cloud-config
#cloud-init
write_files:
- path: /home/ec2-user/.ssh/authorized_keys
permission permissions: '0600'
owner: ec2-user:ec2-user
append: true
content: |
ssh-ed25519 xxxxxxxxxx
--//
Content-Type: application/node.eks.aws
---
apiVersion: node.eks.aws/v1alpha1
kind: NodeConfig
spec:
cluster:
cidr: 172.20.0.0/16
name: eks-acme-staging-euc1-01
kubelet:
maxPodsExpression: "default_enis * (ips_per_eni - 1)"
config:
clusterDNS:
- 172.20.0.10
--//
Content-Type: application/node.eks.aws
# Karpenter Generated NodeConfig
apiVersion: node.eks.aws/v1alpha1
kind: NodeConfig
metadata: {}
spec:
cluster:
....
cidr: 172.20.0.0/16
name: eks-acme-staging-euc1-01
containerd: {}
instance:
localStorage: {}
kubelet:
config:
clusterDNS:
- 172.20.0.10
maxPods: 29
registerWithTaints:
- effect: NoExecute
key: karpenter.sh/unregistered
- effect: NoExecute
key: node.cilium.io/agent-not-ready
value: "true"
- effect: NoExecute
key: ebs.csi.aws.com/agent-not-ready
flags:
- --node-labels="karpenter.k8s.aws/ec2nodeclass=al2023-default,karpenter.sh/capacity-type=spot,karpenter.sh/do-not-sync-taints=true,karpenter.sh/nodepool=default-arm64"
--//--
I see, I think we will need to work with Karpenter to see how we can support this use case. Can you share some more details on why you would like to set the expression to default_enis * (ips_per_eni - 1)? I'm specifically trying to understand the omission of the + 2 - this is intended to account for expected host networking pods like the VPC CNI or kube-proxy
I see, I think we will need to work with Karpenter to see how we can support this use case. Can you share some more details on why you would like to set the expression to
default_enis * (ips_per_eni - 1)? I'm specifically trying to understand the omission of the+ 2- this is intended to account for expected host networking pods like theVPC CNIorkube-proxy
I'm using cilium in IPAM mode with kube-proxy replacement and we encountered scheduled pods failing to start because cilium's couldn't give them IPs.
Cross-linking https://github.com/aws/karpenter-provider-aws/issues/8739