eksctl icon indicating copy to clipboard operation
eksctl copied to clipboard

[Bug] Wrong DNS in container running in a managed nodes in a cluster with custom service CIDR

Open mkl262 opened this issue 3 years ago • 5 comments

Hi,

Im trying to create a cluster with 2 node groups, one managed with spot instances, and another one with unmanaged nodes. The cluster is also configured with serviceIPv4CIDR: 10.255.0.0/16, (rest of config is attached bellow.) When pods start in the unmanaged nodes, they run without problems and have the correct coredns service IP in resolv.conf. But when pods run in the managed nodes, they have the wrong IP in resolv.conf, 172.20.0.10, and are unable to resolve anything.

I assume it happens because of this issue with the EKS AMI: https://github.com/awslabs/amazon-eks-ami/issues/636

Is there some way to specify the coredns address to the managed nodes? because managedNodeGroups doesn't have the clusterDNS or kubeletExtraConfig options in the config.

resolv.conf from container in unmanaged node:

$ cat /etc/resolv.conf 
nameserver 10.255.0.10
search somenamespace.svc.cluster.local svc.cluster.local cluster.local eu-west-2.compute.internal
options ndots:5

resolv.conf from container in managed node:

$ cat /etc/resolv.conf 
nameserver 172.20.0.10
search somenamespace.svc.cluster.local svc.cluster.local cluster.local eu-west-2.compute.internal
options ndots:5

eksctl version:

$ eksctl info
eksctl version: 0.108.0
kubectl version: v1.24.3
OS: darwin

eks.yaml:

apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  name: cluster
  region: eu-west-2
  version: "1.21"

kubernetesNetworkConfig:
  serviceIPv4CIDR: 10.255.0.0/16

vpc:
  cidr: 10.254.0.0/16
  autoAllocateIPv6: false
  clusterEndpoints:
    publicAccess: true
    privateAccess: true
  # nat:
    # gateway: HighlyAvailable

iam:
  withOIDC: true
  serviceAccounts:
    - metadata:
        name: aws-load-balancer-controller
        namespace: kube-system
      wellKnownPolicies:
        awsLoadBalancerController: true
    - metadata:
        name: ebs-csi-controller-sa
        namespace: kube-system
      wellKnownPolicies:
        ebsCSIController: true
    - metadata:
        name: efs-csi-controller-sa
        namespace: kube-system
      wellKnownPolicies:
        efsCSIController: true

addons:
  - name: vpc-cni
    attachPolicyARNs:
      - arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy
  - name: coredns
    version: latest
  - name: kube-proxy
    version: latest


managedNodeGroups:
  - name: spot-group
    amiFamily: Ubuntu2004
    desiredCapacity: 3
    spot: true
    instanceTypes:
      - "t2.2xlarge"
      - "t3.2xlarge"
      - "t3a.2xlarge"
      - "c5.2xlarge"
    labels:
      nodegroup-type: spot-group
    iam:
      withAddonPolicies:
        ebs: true
        fsx: true
        efs: true
    privateNetworking: true
    ssh:
      allow: true
      publicKeyName: cluster-ssh

nodeGroups:
  - name: unmanaged-group
    amiFamily: Ubuntu2004
    instanceType: t3.small
    desiredCapacity: 2
    labels:
      nodegroup-type: unmanaged-group
    iam:
      withAddonPolicies:
        ebs: true
        fsx: true
        efs: true
    privateNetworking: true
    taints:
      - key: app
        value: "unmanaged-group"
        effect: NoSchedule
    ssh:
      allow: true
      publicKeyName: cluster-ssh

cloudWatch:
    clusterLogging:
        enableTypes: ["api", "audit", "authenticator", "controllerManager", "scheduler", "all"]

mkl262 avatar Aug 17 '22 15:08 mkl262

Hello mkl262 :wave: Thank you for opening an issue in eksctl project. The team will review the issue and aim to respond within 1-3 business days. Meanwhile, please read about the Contribution and Code of Conduct guidelines here. You can find out more information about eksctl on our website

github-actions[bot] avatar Aug 17 '22 15:08 github-actions[bot]

Thanks for reporting this issue. I have managed to reproduce this and I am working on understanding whether eksctl is at fault here. I wasn't able to find a workaround for this atm. I will post more updates, thanks for your patience.

TiberiuGC avatar Aug 23 '22 13:08 TiberiuGC

@TiberiuGC Hi, Do you have an update by any chance?

mkl262 avatar Sep 01 '22 12:09 mkl262

Hello @TiberiuGC, I'm facing the same issue. Are there any updates on solving this issue or at least a workaround for this?

joeygo avatar Sep 11 '22 09:09 joeygo

Hi @mkl262 , @joeygo! Unfortunately I don't have an update right now, nor a workaround. Our team's capacity is quickly drained atm by some priority features, we're trying our best to deliver on our roadmap as-well as solving these painful issues. We do appreciate your patience.

TiberiuGC avatar Sep 12 '22 08:09 TiberiuGC

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions[bot] avatar Oct 13 '22 02:10 github-actions[bot]

This issue was closed because it has been stalled for 5 days with no activity.

github-actions[bot] avatar Oct 19 '22 02:10 github-actions[bot]

@TiberiuGC Can you please reopen the issue?

mkl262 avatar Oct 19 '22 13:10 mkl262

@mkl262, I have opened a PR with a fix. It will be out in the next release.

cPu1 avatar Nov 09 '22 13:11 cPu1

@cPu1 Thanks!

mkl262 avatar Jan 23 '23 15:01 mkl262