amazon-eks-ami
amazon-eks-ami copied to clipboard
Wrong clusterDNS in kubelet-config when you specify custom Kubernetes service IP address ranges
What happened:
AWS EKS allows you to specify custom Kubernetes service IP address range. See https://aws.amazon.com/about-aws/whats-new/2020/10/amazon-eks-supports-configurable-kubernetes-service-ip-address-range/.
After we specified the Service IPv4 range to 192.168.0.0/16
we saw that the resolv.conf
in our pods was still referencing to 172.20.0.10
.
bash-5.1# cat /etc/resolv.conf
nameserver 172.20.0.10
So we looked at our kubelet-config to figure out what was happening.
[root@ip-<REDACTED> ~]# grep clusterDNS /etc/kubernetes/kubelet/kubelet-config.json -A2
"clusterDNS": [
"172.20.0.10"
],
Next step was to look at the bootstrap.sh
script to figure out how clusterDNS
was configured.
The relevant parts are in https://github.com/awslabs/amazon-eks-ami/blob/master/files/bootstrap.sh#L291-L346.
You can see that DNS_CLUSTER_IP
is derived from SERVICE_IPV4_CIDR
which itself is the result of calling aws eks describe-cluster
.
aws eks describe-cluster \
--region=${AWS_DEFAULT_REGION} \
--name=${CLUSTER_NAME} \
--output=text \
--query 'cluster.{certificateAuthorityData: certificateAuthority.data, endpoint: endpoint, kubernetesNetworkConfig: kubernetesNetworkConfig.serviceIpv4Cidr}' > $DESCRIBE_CLUSTER_RESULT || rc=$?
....
SERVICE_IPV4_CIDR=$(cat $DESCRIBE_CLUSTER_RESULT | awk '{print $3}')
....
if [[ -z "${DNS_CLUSTER_IP}" ]]; then
if [[ ! -z "${SERVICE_IPV4_CIDR}" ]] && [[ "${SERVICE_IPV4_CIDR}" != "None" ]] ; then
#Sets the DNS Cluster IP address that would be chosen from the serviceIpv4Cidr. (x.y.z.10)
DNS_CLUSTER_IP=${SERVICE_IPV4_CIDR%.*}.10
else
...
if [[ "$TEN_RANGE" != "0" ]]; then
DNS_CLUSTER_IP=172.20.0.10
fi
fi
else
DNS_CLUSTER_IP="${DNS_CLUSTER_IP}"
fi
But aws eks describe-cluster
is just going to be called when if [[ -z "${B64_CLUSTER_CA}" ]] || [[ -z "${APISERVER_ENDPOINT}" ]];
.
In our case both B64_CLUSTER_CA
and B64_CLUSTER_CA
are not empty.
/etc/eks/bootstrap.sh --b64-cluster-ca '<REDACTED>' --apiserver-endpoint '<REDACTED>' --kubelet-extra-args"--node-labels=node.kubernetes.io/lifecycle=`curl -s http://169.254.169.254/latest/meta-data/instance-life-cycle`" '<REDACTED>'
So aws eks describe-cluster
will never run and SERVICE_IPV4_CIDR
is empty which results in DNS_CLUSTER_IP=172.20.0.10
.
I think the best way to handle this is adding DNS_CLUSTER_IP
to if [[ -z "${B64_CLUSTER_CA}" ]] || [[ -z "${APISERVER_ENDPOINT}" ]]; then
, what do you think?
What you expected to happen:
The correct clusterDNS
is going to be set by bootstrap.sh
.
How to reproduce it (as minimally and precisely as possible):
Specify custom Kubernetes service IP address range without overriding the flag --dns-cluster-ip
for bootstrap.sh
.
Anything else we need to know?:
Environment:
- AWS Region: eu-central-1
- Instance Type(s): m5n.xlarge, c5n.xlarge
- EKS Platform version (use
aws eks describe-cluster --name <name> --query cluster.platformVersion
): "eks.1" - Kubernetes version (use
aws eks describe-cluster --name <name> --query cluster.version
): "1.19" - AMI Version:
- Kernel (e.g.
uname -a
):Linux ip-<REDACTED>.eu-central-1.compute.internal 5.4.95-42.163.amzn2.x86_64 #1 SMP Thu Feb 4 12:50:05 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
- Release information (run
cat /etc/eks/release
on a node):
BASE_AMI_ID="ami-0a1ccc021b9016ec9"
BUILD_TIME="Wed Mar 10 19:47:23 UTC 2021"
BUILD_KERNEL="5.4.95-42.163.amzn2.x86_64"
ARCH="x86_64
Thanks for raising the issue, we will be working on a fix.
As a workaround, you can avoid specifying --apiserver-endpoint
or --b64-cluster-ca
in your bootstrap arguments. This will then populate SERVICE_IPV4_CIDR
https://github.com/awslabs/amazon-eks-ami/blob/master/files/bootstrap.sh#L322 to get the correct custom service IP.
Your situation makes sense. I see two pieces to this. Documentation around handling this situation and maybe discoverability are an issue. Also, there's a feature request to simplify the customer experience when using configurable Kubernetes service IP address ranges.
I've created a feature request where I detail the situation and give possibly options for resolving the situation. We actually had to roll back the fix that you suggest, call DescribeCluster if DNS_CLUSTER_IP
is unset, so it's not a viable option at this point in time. Feel free to upvote or comment on the feature request.
I'm going use this issue to track updating documentation around the situation.
As of today, here are the available options to unblock this issue:
- Do not set --dns-cluster-ip, --b64-cluster-ca or --apiserver-endpoint
- Do set --dns-cluster-ip, --b64-cluster-ca and --apiserver-endpoint
- Use EKS managed node groups
I stumbled upon the same problem after specyfying a custom service cidr for EKS
As I am using managed nodes with a custom template I needed to add --dns-cluster-ip argument to /etc/eks/bootstrap.sh in it and override the dns cluster ip value. After that it is ok for me. But in general I didn't suspected such issue and it took a little bit to get to the root cause of it. I hope it will change as you enabled setting service IP CIDR and a user shouldn't expect such kind a problem after using that option.
Also got to this issue setting custom VPC cidr not setting --dns-cluster-ip, --b64-cluster-ca and --apiserver-endpoint fixed it for me
eksctl 0.73.0
Folks, same issue here. Are there any updates on this? I included using "kubelet's extra args" the --dns-cluster-ip
attribute manually and it worked. In my scenario, I also needed to specify a custom service IPV4 CIDR and the workaround does not seem to be the better approach.
This bug is affecting our environment as well. Hoping for a bugfix on this eks bootstrap script at some point.
We've run into this as well using terraform to provision EKS. Disappointing that the issue has been open since 2021 without resolution
In our specific case, nodes provisioned thru a Managed Node Group work, but nodes provisioned as Self Managed Node Groups do not
EKS 1.25 the bug is still here
Welp, lost a week to this on EKS 1.28.
@jonathan-creasy-incode how do you solve it finally? I'm using terraform aws eks module with kubernetes version 1.27 and the pods are using 172.20.0.10 as nameserver