amazon-eks-ami icon indicating copy to clipboard operation
amazon-eks-ami copied to clipboard

EKS bootstrap blindly assumes IPv4 for cluster if B64_CLUSTER_CA and APISERVER_ENDPOINT are set

Open archoversight opened this issue 2 years ago • 7 comments

What happened:

Deployed an IPv6 enabled EKS cluster using https://github.com/terraform-aws-modules/terraform-aws-eks

What you expected to happen:

IPv6 DNS to correctly be set, instead it is using the 172.x.x.x address

How to reproduce it (as minimally and precisely as possible):

Deploy cluster using https://github.com/terraform-aws-modules/terraform-aws-eks and set it to IPv6. It will create a user data that includes the B64_CLUSTER_CA and APISERVER_ENDPOINT at which point:

https://github.com/awslabs/amazon-eks-ami/blob/60550f3eeed450d5f57b591f6c6ea76d1c494439/files/bootstrap.sh#L377-L381

Will assume that the only option is IPv4.

User data:

#!/bin/bash
set -e
echo "Fetching AWS CLIv2"
(
  cd /tmp
  curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
  unzip awscliv2.zip
  ./aws/install
  rm -rf /bin/aws
  ln -s /usr/local/bin/aws /bin/aws
)
echo "Starting bootstrap stage"
B64_CLUSTER_CA=[...]
API_SERVER_URL=https://hexdata.us-gov-west-1.eks.amazonaws.com
/etc/eks/bootstrap.sh test-aws-gov-k8s --kubelet-extra-args '--node-labels=group=management' --b64-cluster-ca $B64_CLUSTER_CA --apiserver-endpoint $API_SERVER_URL
cd /tmp
sudo yum install -y https://s3.amazonaws.com/ec2-downloads-windows/SSMAgent/latest/linux_amd64/amazon-ssm-agent.rpm
sudo systemctl enable amazon-ssm-agent
sudo systemctl start amazon-ssm-agent

Anything else we need to know?:

Environment: govCloud

  • AWS Region: us-gov-west-1
  • Instance Type(s): t3a.large
  • EKS Platform version (use aws eks describe-cluster --name <name> --query cluster.platformVersion): eks.5
  • Kubernetes version (use aws eks describe-cluster --name <name> --query cluster.version): 1.21
  • AMI Version: ami-03e76509bb9349ef0
  • Kernel (e.g. uname -a): Linux ip-10-58-130-206.gov.aws.test.example.internal 5.4.181-99.354.amzn2.x86_64 #1 SMP Wed Mar 2 18:50:46 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
  • Release information (run cat /etc/eks/release on a node):
BASE_AMI_ID="ami-04149a3294d95f9d1"
BUILD_TIME="Wed Mar  9 04:02:56 UTC 2022"
BUILD_KERNEL="5.4.181-99.354.amzn2.x86_64"
ARCH="x86_64"

archoversight avatar Mar 23 '22 23:03 archoversight

Thanks for the the feedback. The issue here is that when adding arguments we want to make sure the behavior supported with just the pre-existing arguments doesn't change. This guarantees backward compatibility with the AMI once you have correctly set the arguments for any future AMI Release.

abeer91 avatar Apr 08 '22 15:04 abeer91

I believe this is resolved with https://github.com/awslabs/amazon-eks-ami/pull/860 - cc @bwagner5 / @suket22

bryantbiggs avatar Apr 09 '22 15:04 bryantbiggs

This issue won't be resolved with this. It will still set the IP_FAMILY to ipv4.

The script when provided the B64_CLUSTER_CA and APISERVER_ENDPOINT should ask the kube API what the service address range is, rather than blindly accepting to move forward with IPv4.

archoversight avatar Apr 12 '22 17:04 archoversight

I am not sure that I agree that it will be fixed with that issue. After that issue the bootstrap script still assumes IPv4 if the B64_CLUSTER_CA and APISERVER_ENDPOINT endpoint are set, because this line:

https://github.com/literalice/amazon-eks-ami/blob/a2d4c4c924e9576e57cf8aaaa6e57a25015b8e5f/files/bootstrap.sh#L331

Falls through, and then:

https://github.com/literalice/amazon-eks-ami/blob/a2d4c4c924e9576e57cf8aaaa6e57a25015b8e5f/files/bootstrap.sh#L370

This sets the family to IPv4.

archoversight avatar Apr 12 '22 17:04 archoversight

Thanks for the the feedback. The issue here is that when adding arguments we want to make sure the behavior supported with just the pre-existing arguments doesn't change. This guarantees backward compatibility with the AMI once you have correctly set the arguments for any future AMI Release.

That's perfectly acceptable, and I agree with that. Given the cluster CA/apiserver endpoint we could interrogate k8s to find out what the service address is and whether that is IPv4 or IPv6... vs right now blindly assuming that if those are passed that it must be an IPv4 cluster.

archoversight avatar Apr 12 '22 17:04 archoversight

This issue won't be resolved with this. It will still set the IP_FAMILY to ipv4.

correct, users have to opt into "IP_FAMILY=ipv6" to enable IPv6 support which was just fixed in #860

bryantbiggs avatar Apr 12 '22 17:04 bryantbiggs

The disappointing thing is that it still doesn't bring parity with IPv4, you either need to provide the DNS IP address manually, OR you need to provide the service address range.

Getting that service address range though means calling the aws command to fetch the address range from the EKS cluster by name, rather than the script doing it for us.

With IPv4 there is no requirement to pass in the service address range at all, it is assumed to be in 10/8 or 172.16/12 (if the former is in use in the VPC).

If I could pass --ip-family=ipv6 and it just did the right thing that would be great, and would mean I don't have to smuggle data into the cloud-init script to get it to join the cluster correctly.

archoversight avatar Apr 16 '22 00:04 archoversight