eksctl
eksctl copied to clipboard
[Bug] Wrong DNS in container running in a managed nodes in a cluster with custom service CIDR
Hi,
Im trying to create a cluster with 2 node groups, one managed with spot instances, and another one with unmanaged nodes.
The cluster is also configured with serviceIPv4CIDR: 10.255.0.0/16, (rest of config is attached bellow.)
When pods start in the unmanaged nodes, they run without problems and have the correct coredns service IP in resolv.conf.
But when pods run in the managed nodes, they have the wrong IP in resolv.conf, 172.20.0.10, and are unable to resolve anything.
I assume it happens because of this issue with the EKS AMI: https://github.com/awslabs/amazon-eks-ami/issues/636
Is there some way to specify the coredns address to the managed nodes? because managedNodeGroups doesn't have the clusterDNS or kubeletExtraConfig options in the config.
resolv.conf from container in unmanaged node:
$ cat /etc/resolv.conf
nameserver 10.255.0.10
search somenamespace.svc.cluster.local svc.cluster.local cluster.local eu-west-2.compute.internal
options ndots:5
resolv.conf from container in managed node:
$ cat /etc/resolv.conf
nameserver 172.20.0.10
search somenamespace.svc.cluster.local svc.cluster.local cluster.local eu-west-2.compute.internal
options ndots:5
eksctl version:
$ eksctl info
eksctl version: 0.108.0
kubectl version: v1.24.3
OS: darwin
eks.yaml:
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
name: cluster
region: eu-west-2
version: "1.21"
kubernetesNetworkConfig:
serviceIPv4CIDR: 10.255.0.0/16
vpc:
cidr: 10.254.0.0/16
autoAllocateIPv6: false
clusterEndpoints:
publicAccess: true
privateAccess: true
# nat:
# gateway: HighlyAvailable
iam:
withOIDC: true
serviceAccounts:
- metadata:
name: aws-load-balancer-controller
namespace: kube-system
wellKnownPolicies:
awsLoadBalancerController: true
- metadata:
name: ebs-csi-controller-sa
namespace: kube-system
wellKnownPolicies:
ebsCSIController: true
- metadata:
name: efs-csi-controller-sa
namespace: kube-system
wellKnownPolicies:
efsCSIController: true
addons:
- name: vpc-cni
attachPolicyARNs:
- arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy
- name: coredns
version: latest
- name: kube-proxy
version: latest
managedNodeGroups:
- name: spot-group
amiFamily: Ubuntu2004
desiredCapacity: 3
spot: true
instanceTypes:
- "t2.2xlarge"
- "t3.2xlarge"
- "t3a.2xlarge"
- "c5.2xlarge"
labels:
nodegroup-type: spot-group
iam:
withAddonPolicies:
ebs: true
fsx: true
efs: true
privateNetworking: true
ssh:
allow: true
publicKeyName: cluster-ssh
nodeGroups:
- name: unmanaged-group
amiFamily: Ubuntu2004
instanceType: t3.small
desiredCapacity: 2
labels:
nodegroup-type: unmanaged-group
iam:
withAddonPolicies:
ebs: true
fsx: true
efs: true
privateNetworking: true
taints:
- key: app
value: "unmanaged-group"
effect: NoSchedule
ssh:
allow: true
publicKeyName: cluster-ssh
cloudWatch:
clusterLogging:
enableTypes: ["api", "audit", "authenticator", "controllerManager", "scheduler", "all"]
Hello mkl262 :wave: Thank you for opening an issue in eksctl project. The team will review the issue and aim to respond within 1-3 business days. Meanwhile, please read about the Contribution and Code of Conduct guidelines here. You can find out more information about eksctl on our website
Thanks for reporting this issue. I have managed to reproduce this and I am working on understanding whether eksctl is at fault here. I wasn't able to find a workaround for this atm. I will post more updates, thanks for your patience.
@TiberiuGC Hi, Do you have an update by any chance?
Hello @TiberiuGC, I'm facing the same issue. Are there any updates on solving this issue or at least a workaround for this?
Hi @mkl262 , @joeygo! Unfortunately I don't have an update right now, nor a workaround. Our team's capacity is quickly drained atm by some priority features, we're trying our best to deliver on our roadmap as-well as solving these painful issues. We do appreciate your patience.
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.
This issue was closed because it has been stalled for 5 days with no activity.
@TiberiuGC Can you please reopen the issue?
@mkl262, I have opened a PR with a fix. It will be out in the next release.
@cPu1 Thanks!