amazon-eks-ami icon indicating copy to clipboard operation
amazon-eks-ami copied to clipboard

bug(kubelet): maxPodsExpression not applied everywhere

Open sylr opened this issue 3 months ago • 6 comments

What happened:

I tried to set maxPodsExpression to default_enis * (ips_per_eni - 1) and it reflects correctly in /etc/kubernetes/kubelet/config.json (the instance is a m7g.xlarge, so 4 ENIs with 15 IPs per ENI -> 60 IPs max):

[ec2-user@ip-10-100-155-89 ~]$ cat /etc/kubernetes/kubelet/config.json | jq .maxPods
56

However it's not reflected in /etc/kubernetes/kubelet/config.json.d/40-nodeadm.conf

[ec2-user@ip-10-100-155-89 ~]$ cat /etc/kubernetes/kubelet/config.json.d/40-nodeadm.conf | jq .maxPods
58

What you expected to happen:

[ec2-user@ip-10-100-155-89 ~]$ cat /etc/kubernetes/kubelet/config.json.d/40-nodeadm.conf | jq .maxPods
56

How to reproduce it (as minimally and precisely as possible):

Environment:

  • AWS Region: eu-central-1
  • Instance Type(s): m7g.xlarge
  • Cluster Kubernetes version: 1.34
  • Node Kubernetes version:
  • AMI Version: al2023@latest

sylr avatar Nov 28 '25 10:11 sylr

@sylr are you using Karpenter or a similar project for launching and configuring your nodes?

this behavior is outlined in the docs https://github.com/awslabs/amazon-eks-ami/blob/main/nodeadm/doc/examples.md#defining-a-max-pods-expression, with the rationale being that users would only set explicit maxPods values if they want them applied over the maxPodsExpression. IIRC Karpenter always sets maxPods, which could explain the behavior you are seeing if you are not setting it yourself

mselim00 avatar Dec 01 '25 18:12 mselim00

@mselim00 I am using karpenter, here the EC2NodeClass I'm using:

---
    apiVersion: karpenter.k8s.aws/v1
    kind: EC2NodeClass
    metadata:
      name: al2023-default
    spec:
      # Required, resolves a default ami and userdata
      amiFamily: AL2023
      amiSelectorTerms:
      - alias: al2023@latest
    
      # Required, discovers subnets to attach to instances
      subnetSelectorTerms:
      - tags:
          "kubernetes.io/cluster/eks-acme-staging-euc1-01": "owned"
          "kubernetes.io/role/internal-elb": "1"
    
      # Required, discovers security groups to attach to instances
      securityGroupSelectorTerms:
      - id: sg-07ec9d6eedf144b40 # eks-nodes-acme-staging-euc1-01
      - id: sg-0364e80630e42138e # eks-cluster-sg-eks-acme-staging-euc1-01-387902043
    
      instanceProfile: ProfileName
    
      # Optional, overrides autogenerated userdata with a merge semantic
      userData: |
        Content-Type: multipart/mixed; boundary="MIMEBOUNDARY"
        MIME-Version: 1.0
    
        --MIMEBOUNDARY
        Content-Transfer-Encoding: 7bit
        Content-Type: text/cloud-config
        Mime-Version: 1.0
    
        #cloud-init
        write_files:
        - path: /home/ec2-user/.ssh/authorized_keys
          permission permissions: '0600'
          owner: ec2-user:ec2-user
          append: true
          content: |
            ssh-ed25519 xxxxxxxxxxx
    
        --MIMEBOUNDARY
        Content-Transfer-Encoding: 7bit
        Content-Type: application/node.eks.aws
        Mime-Version: 1.0
    
        ---
        apiVersion: node.eks.aws/v1alpha1
        kind: NodeConfig
        spec:
          cluster:
            cidr: 172.20.0.0/16
            name: eks-acme-staging-euc1-01
          kubelet:
            maxPodsExpression: "default_enis * (ips_per_eni - 1)"
            config:
              clusterDNS:
              - 172.20.0.10
    
        --MIMEBOUNDARY--
    
      # Optional, propagates tags to underlying EC2 resources
      tags:
        company.com/scope: kubernetes
        company.com/owner: [email protected]
    
      # Optional, configures IMDS for the instance
      metadataOptions:
        httpEndpoint: enabled
        httpProtocolIPv6: disabled
        httpPutResponseHopLimit: 1
        httpTokens: required
    
      # Optional, configures storage devices for the instance
      blockDeviceMappings:
      # Root device
      - deviceName: /dev/xvda
        rootVolume: true
        ebs:
          volumeSize: 25Gi
          volumeType: gp3
          encrypted: true
    
      # Optional, configures detailed monitoring for the instance
      detailedMonitoring: true

sylr avatar Dec 02 '25 13:12 sylr

Here is the instance userdata and we can see the hardcoded maxPods:

MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="//"

--//
Content-Type: text/cloud-config

#cloud-init
write_files:
- path: /home/ec2-user/.ssh/authorized_keys
  permission permissions: '0600'
  owner: ec2-user:ec2-user
  append: true
  content: |
    ssh-ed25519 xxxxxxxxxx

--//
Content-Type: application/node.eks.aws

---
apiVersion: node.eks.aws/v1alpha1
kind: NodeConfig
spec:
  cluster:
    cidr: 172.20.0.0/16
    name: eks-acme-staging-euc1-01
  kubelet:
    maxPodsExpression: "default_enis * (ips_per_eni - 1)"
    config:
      clusterDNS:
      - 172.20.0.10

--//
Content-Type: application/node.eks.aws

# Karpenter Generated NodeConfig
apiVersion: node.eks.aws/v1alpha1
kind: NodeConfig
metadata: {}
spec:
  cluster:
    ....
    cidr: 172.20.0.0/16
    name: eks-acme-staging-euc1-01
  containerd: {}
  instance:
    localStorage: {}
  kubelet:
    config:
      clusterDNS:
      - 172.20.0.10
      maxPods: 29
      registerWithTaints:
      - effect: NoExecute
        key: karpenter.sh/unregistered
      - effect: NoExecute
        key: node.cilium.io/agent-not-ready
        value: "true"
      - effect: NoExecute
        key: ebs.csi.aws.com/agent-not-ready
    flags:
    - --node-labels="karpenter.k8s.aws/ec2nodeclass=al2023-default,karpenter.sh/capacity-type=spot,karpenter.sh/do-not-sync-taints=true,karpenter.sh/nodepool=default-arm64"

--//--

sylr avatar Dec 02 '25 14:12 sylr

I see, I think we will need to work with Karpenter to see how we can support this use case. Can you share some more details on why you would like to set the expression to default_enis * (ips_per_eni - 1)? I'm specifically trying to understand the omission of the + 2 - this is intended to account for expected host networking pods like the VPC CNI or kube-proxy

mselim00 avatar Dec 02 '25 15:12 mselim00

I see, I think we will need to work with Karpenter to see how we can support this use case. Can you share some more details on why you would like to set the expression to default_enis * (ips_per_eni - 1)? I'm specifically trying to understand the omission of the + 2 - this is intended to account for expected host networking pods like the VPC CNI or kube-proxy

I'm using cilium in IPAM mode with kube-proxy replacement and we encountered scheduled pods failing to start because cilium's couldn't give them IPs.

sylr avatar Dec 02 '25 18:12 sylr

Cross-linking https://github.com/aws/karpenter-provider-aws/issues/8739

sylr avatar Dec 02 '25 18:12 sylr