eksctl [Bug] Wrong DNS in container running in a managed nodes in a cluster with custom service CIDR

Hi,

Im trying to create a cluster with 2 node groups, one managed with spot instances, and another one with unmanaged nodes. The cluster is also configured with serviceIPv4CIDR: 10.255.0.0/16, (rest of config is attached bellow.) When pods start in the unmanaged nodes, they run without problems and have the correct coredns service IP in resolv.conf. But when pods run in the managed nodes, they have the wrong IP in resolv.conf, 172.20.0.10, and are unable to resolve anything.

I assume it happens because of this issue with the EKS AMI: https://github.com/awslabs/amazon-eks-ami/issues/636

Is there some way to specify the coredns address to the managed nodes? because managedNodeGroups doesn't have the clusterDNS or kubeletExtraConfig options in the config.

resolv.conf from container in unmanaged node:

$ cat /etc/resolv.conf 
nameserver 10.255.0.10
search somenamespace.svc.cluster.local svc.cluster.local cluster.local eu-west-2.compute.internal
options ndots:5

resolv.conf from container in managed node:

$ cat /etc/resolv.conf 
nameserver 172.20.0.10
search somenamespace.svc.cluster.local svc.cluster.local cluster.local eu-west-2.compute.internal
options ndots:5

eksctl version:

$ eksctl info
eksctl version: 0.108.0
kubectl version: v1.24.3
OS: darwin

eks.yaml:

apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  name: cluster
  region: eu-west-2
  version: "1.21"

kubernetesNetworkConfig:
  serviceIPv4CIDR: 10.255.0.0/16

vpc:
  cidr: 10.254.0.0/16
  autoAllocateIPv6: false
  clusterEndpoints:
    publicAccess: true
    privateAccess: true
  # nat:
    # gateway: HighlyAvailable

iam:
  withOIDC: true
  serviceAccounts:
    - metadata:
        name: aws-load-balancer-controller
        namespace: kube-system
      wellKnownPolicies:
        awsLoadBalancerController: true
    - metadata:
        name: ebs-csi-controller-sa
        namespace: kube-system
      wellKnownPolicies:
        ebsCSIController: true
    - metadata:
        name: efs-csi-controller-sa
        namespace: kube-system
      wellKnownPolicies:
        efsCSIController: true

addons:
  - name: vpc-cni
    attachPolicyARNs:
      - arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy
  - name: coredns
    version: latest
  - name: kube-proxy
    version: latest


managedNodeGroups:
  - name: spot-group
    amiFamily: Ubuntu2004
    desiredCapacity: 3
    spot: true
    instanceTypes:
      - "t2.2xlarge"
      - "t3.2xlarge"
      - "t3a.2xlarge"
      - "c5.2xlarge"
    labels:
      nodegroup-type: spot-group
    iam:
      withAddonPolicies:
        ebs: true
        fsx: true
        efs: true
    privateNetworking: true
    ssh:
      allow: true
      publicKeyName: cluster-ssh

nodeGroups:
  - name: unmanaged-group
    amiFamily: Ubuntu2004
    instanceType: t3.small
    desiredCapacity: 2
    labels:
      nodegroup-type: unmanaged-group
    iam:
      withAddonPolicies:
        ebs: true
        fsx: true
        efs: true
    privateNetworking: true
    taints:
      - key: app
        value: "unmanaged-group"
        effect: NoSchedule
    ssh:
      allow: true
      publicKeyName: cluster-ssh

cloudWatch:
    clusterLogging:
        enableTypes: ["api", "audit", "authenticator", "controllerManager", "scheduler", "all"]

Aug 17 '22 15:08 mkl262

Hello mkl262 :wave: Thank you for opening an issue in eksctl project. The team will review the issue and aim to respond within 1-3 business days. Meanwhile, please read about the Contribution and Code of Conduct guidelines here. You can find out more information about eksctl on our website

Aug 17 '22 15:08 github-actions[bot]

Thanks for reporting this issue. I have managed to reproduce this and I am working on understanding whether eksctl is at fault here. I wasn't able to find a workaround for this atm. I will post more updates, thanks for your patience.

Aug 23 '22 13:08 TiberiuGC

@TiberiuGC Hi, Do you have an update by any chance?

Sep 01 '22 12:09 mkl262

Hello @TiberiuGC, I'm facing the same issue. Are there any updates on solving this issue or at least a workaround for this?

Sep 11 '22 09:09 joeygo

Hi @mkl262 , @joeygo! Unfortunately I don't have an update right now, nor a workaround. Our team's capacity is quickly drained atm by some priority features, we're trying our best to deliver on our roadmap as-well as solving these painful issues. We do appreciate your patience.

Sep 12 '22 08:09 TiberiuGC

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

Oct 13 '22 02:10 github-actions[bot]

This issue was closed because it has been stalled for 5 days with no activity.

Oct 19 '22 02:10 github-actions[bot]

@TiberiuGC Can you please reopen the issue?

Oct 19 '22 13:10 mkl262

@mkl262, I have opened a PR with a fix. It will be out in the next release.

Nov 09 '22 13:11 cPu1

@cPu1 Thanks!

Jan 23 '23 15:01 mkl262

eksctl eksctl copied to clipboard

[Bug] Wrong DNS in container running in a managed nodes in a cluster with custom service CIDR

eksctl
eksctl copied to clipboard