containers-roadmap icon indicating copy to clipboard operation
containers-roadmap copied to clipboard

[EKS] [request]: VPC endpoint support for EKS API

Open tdmalone opened this issue 5 years ago • 29 comments

Tell us about your request VPC endpoint support for EKS, so that worker nodes that can register with an EKS-managed cluster without requiring outbound internet access.

Which service(s) is this request for? EKS

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard? Worker nodes based on the EKS AMI run bootstrap.sh to connect themselves to the cluster. As part of this process, aws eks describe-cluster is called, which currently requires outbound internet access.

I'd love to be able to turn off outbound internet access but still easily bootstrap worker nodes without providing additional configuration.

Are you currently working around this issue?

  • Providing outbound internet access to worker nodes; OR
  • Supplying the cluster CA and API endpoint directly to bootstrap.sh.

Additional context

  • Relates somewhat to #22 & #221, but for the AWS EKS API rather than the Kubernetes control plane API

tdmalone avatar May 20 '19 03:05 tdmalone

Is there any news on this?

devonkinghorn avatar Jan 23 '20 15:01 devonkinghorn

Any updates on this issue?

michael-burt avatar Jan 23 '20 22:01 michael-burt

If you use EKS Managed Nodes, the bootstrapping process avoids the aws eks describe-cluster API call, so you can launch workers into a private subnet without outbound internet access as long as you setup the other required PrivateLink endpoints correctly.

mikestef9 avatar Jan 24 '20 00:01 mikestef9

Thanks Mike. Unfortunately managed nodes are not an option because they cannot be scaled to 0. We run some machine learning workloads that require scaling up ASGs with expensive VMs (x1.32xlarge) and we need to be able to scale them back to 0 once the workloads have completed.

michael-burt avatar Jan 24 '20 00:01 michael-burt

Thanks for the feedback. Can you open a separate GH issue with that feature request for Managed Node Groups?

Will keep this issue open as it's something we are researching.

mikestef9 avatar Jan 24 '20 02:01 mikestef9

@mikestef9 I'm interested in the managed nodes solution. What do you mean by "you can launch workers into a private subnet without outbound internet access as long as you setup the other required PrivateLink endpoints correctly"?

Which PrivateLink endpoints are you referring to? Just the other service endpoints such as SQS and SNS that the applications running on the cluster may happen to use? Or do you mean that there are particular PrivateLink endpoints required to run EKS in private subnets with no internet gateway?

dsw88 avatar Jan 28 '20 17:01 dsw88

Hi @dsw88,

In order for the worker node to join the cluster, you will need to configure VPC endpoints for ECR, EC2, and S3

See this GH repo https://github.com/jpbarto/private-eks-cluster created by an AWS Solutions Architect for a reference implementation. Note that only 1.13 and above EKS clusters have a kubelet version that is compatible with the ECR VPC endpoint.

mikestef9 avatar Jan 28 '20 19:01 mikestef9

@mikestef9 Thanks so much for the info, and thanks for the pointer to the private EKS cluster reference repository!

I have one final question that I'm having a hard time figuring out how to deal with: How can I configure other hosts in this same private VPC to be able to talk to the cluster? Knowing the private DNS name isn't a huge deal, because I can just hard-code it into whatever needs to talk to the cluster. A bigger problem, however, is how a host in the private VPC can authenticate with the cluster.

Currently when I use the AWS API to set up a kubeconfig with EKS, it includes the following snippet in the generated kubeconfig file:

- name: arn:aws:eks:REGION:ACCOUNT_ID:cluster/CLUSTER_NAME
  user:
    exec:
      apiVersion: client.authentication.k8s.io/v1alpha1
      args:
      - --region
      - REGION
      - eks
      - get-token
      - --cluster-name
      - CLUSTER_NAME
      command: aws
      env: null

As you can see, it called the EKS API to get a token that authenticates it with the cluster. That definitely presents a problem since my hosts in the private VPC also don't have access to the EKS API. Is there another way that I can authenticate to the cluster without EKS API access?

dsw88 avatar Feb 03 '20 17:02 dsw88

See this GH repo https://github.com/jpbarto/private-eks-cluster created by an AWS Solutions Architect for a reference implementation. Note that only 1.13 and above EKS clusters have a kubelet version that is compatible with the ECR VPC endpoint.

It seems that this repo uses unmanaged nodes though. I tried deploying it and it brought up a cluster without any nodes listed under the EKS web console. Is this correct?

zucler avatar Feb 07 '20 02:02 zucler

@mikestef9 Thank you very much for this clue. Now I have a working setup with managed worker groups and no access to the Internet :tada:

I was not sure if it's feasible as the documentation says:

Amazon EKS managed node groups can be launched in both public and private subnets. The only requirement is for the subnets to have outbound internet access. Amazon EKS automatically associates a public IP to the instances started as part of a managed node group to ensure that these instances can successfully join a cluster.

Well, apparently it is. If someone needs working Terraform recipes, ping me [email protected].

vranystepan avatar Feb 10 '20 12:02 vranystepan

@vranystepan great to hear you have this working. As part of our fix for #607 we will make sure to get our documentation updated.

mikestef9 avatar Feb 10 '20 23:02 mikestef9

This is still a real issue.

I need to actually create and delete new clusters from private subnets with no NAT or Egress gateways. I can create private endpoints for apparently every AWS service but EKS. This is a a deep pain for some customers, as we have to build complicated workarounds to have traffic routed towards the EKS service, whereas every other AWS service is easily exposed with a private endpoint.

duckie avatar Jun 18 '20 20:06 duckie

I agree with @duckie this issue should not be closed yet. EKS support is laughable.

evanlurvey avatar Oct 26 '20 23:10 evanlurvey

I agree that VPC endpoints are still very important, and this issue should be kept open. It is possible to run EKS clusters in private subnets with no internet egress, but it is not possible to manage those clusters from within that private VPC. We are limited in the tooling we can develop around EKS for lifecycle actions such as creating, updating, and deleting clusters because we can't perform those actions inside our private VPC. Please consider implementing a VPC endpoint for EKS! Thanks!

dsw88 avatar Oct 27 '20 16:10 dsw88

Hi, Any workaround for this issue? We should able to create and manage EKS cluster in private VPC. In our situation (due to security policies), our bastion server (and vpc) don’t have public access. In that case how we can create an eks cluster? We are using Terraform to provision EKS.

amitkarpe avatar Feb 26 '21 08:02 amitkarpe

Is there status on this issue? This is a real problem for vendors that only use the bootstrap.sh to perform automated eks deployments because our environment are private. I would like to know if anyone is working on this eks private endpoint? Thanks

taro-cmd avatar Apr 16 '21 14:04 taro-cmd

We have the problem too. We've built a private cluster for a private vpc with CDK (the VPC is connected to a Transit Gateway). CDK makes usage of a custom resource lambda for doing the kubeconfig update. When the cluster endpointAccess is private (or public and private) this lambda is associated to the VPC (via ENIs). The Lambda function calls "aws eks update-kubeconfig" from "inside" of the VPC, but is unable to access the cluster endpoint and fails with a timeout. All necessary VPC Endpoints (according to the official EKS docs) are set (ecr.api, ecr.dkr, s3, ...,).

torengaw avatar May 07 '21 13:05 torengaw

+1 Making fully private clusters that are custom cloud formation resources is actually not possible without this: a lambda in VPC cannot get kubectl tokens.

xor007 avatar Sep 24 '21 04:09 xor007

+1 For my case, I cannot use codebuild with attached VPC (all subnets are private) to call to the private EKS cluster via "aws eks update-kubeconfig"

The result would be Connect timeout on endpoint URL: "https://eks.<region>.amazonaws.com/clusters/xxxxx"

ctrongminh avatar Oct 09 '21 10:10 ctrongminh

when i create cluster with no internet access, getting below error... Is there any update on VPC endpoint support for EKS API?

Command used to create cluster: aws eks create-cluster
--region ap-southeast-1
--name CP-EKS-TEST-NHSK
--kubernetes-version 1.21
--role-arn arn:aws:iam::4103:role/nhsk
--resources-vpc-config subnetIds=subnet-063b9,subnet-04,securityGroupIds=sg-03

Error Message: connect timeout on endpoint url: "https://eks.ap-southeast-1.amazonaws.com/clusters"

nhsk4u avatar Dec 28 '21 05:12 nhsk4u

I need this as well. Is there a solution or a current workaround yet?

laurecs avatar Jan 28 '22 09:01 laurecs

Commenting as well. An EKS VPC Endpoint would be a huge help. Have there been any updates recently?

djjames72 avatar Mar 10 '22 14:03 djjames72

@mikestef9

If you use EKS Managed Nodes, the bootstrapping process avoids the aws eks describe-cluster API call, so you can launch workers into a private subnet without outbound internet access as long as you setup the other required PrivateLink endpoints correctly.

Mike, what are the "other required endpoints"? Is there a list somewhere that says, "here are all of the endpoints that a managed node requires"?

deitch avatar Jun 24 '22 10:06 deitch

@mikestef9

If you use EKS Managed Nodes, the bootstrapping process avoids the aws eks describe-cluster API call, so you can launch workers into a private subnet without outbound internet access as long as you setup the other required PrivateLink endpoints correctly.

Mike, what are the "other required endpoints"? Is there a list somewhere that says, "here are all of the endpoints that a managed node requires"?

@deitch imho the folowing VPC endpoints are required :

  • ecr.api with interface mode
  • ecr.dkr with interface mode
  • s3 with gateway mode. On this point you also need to configure the a new route to join s3 via this gateway.

Xat59 avatar Jun 24 '22 13:06 Xat59

Cool thanks. Are the ECR only if you use containers from ECR? Or general requirement?

This should be documented formally somewhere in AWS.

deitch avatar Jun 24 '22 13:06 deitch

Cool thanks. Are the ECR only if you use containers from ECR? Or general requirement?

This should be documented formally somewhere in AWS.

Using EKS then ECR is required to bootstrap nodes. And because ECR stores images on S3 under-the-hood, you have to get access to S3. You can take a look at this documentation for EKS : https://docs.aws.amazon.com/eks/latest/userguide/private-clusters.html

Xat59 avatar Jun 24 '22 13:06 Xat59

Much appreciated.

deitch avatar Jun 24 '22 13:06 deitch

Are there any updates on this team?

malikdraz avatar Aug 29 '22 02:08 malikdraz

Cluster autoscaler, when running in a private EKS cluster, also experiences that problem:

	managed_nodegroup_cache.go:133] Failed to query the managed nodegroup foo for the cluster bar while looking for labels/taints: RequestError: send request failed
	caused by: Get "https://eks.<region>.amazonaws.com/clusters/bar/node-groups/foo": dial tcp <*public_IP*>:443: i/o timeout

After reading https://docs.aws.amazon.com/eks/latest/userguide/cluster-endpoint.html I think there could be a w/a to that: "DHCP options set for your VPC must include AmazonProvidedDNS in its domain name servers list". But I'm not sure which domain name to configure in dhcp options... Should it be eks.<region>.amazonaws.com?

bogdando avatar Sep 15 '22 08:09 bogdando

Amazon EKS now supports AWS PrivateLink for the EKS management APIs.

A few call outs:

  • VPC endpoint policies are not supported.

  • EKS support for AWS PrivateLink is available in the following AWS Regions: US East (Ohio, N. Virginia), US West (Oregon, N. California), Africa (Cape Town), Asia Pacific (Hong Kong, Mumbai, Singapore, Sydney, Seoul, Tokyo), Canada (Central), Europe (Ireland, Frankfurt, London, Stockholm, Paris, Milan), Middle East (Bahrain), South America (Sao Paulo), AWS GovCloud (US), China (Beijing), and China (Ningxia).

    • EKS API PrivateLink is not yet available in the following regions: Asia Pacific (Osaka), Asia Pacific (Jakarta), Middle East (UAE).
  • This is PrivateLink support for the EKS management APIs (createCluster etc), not the Kubernetes API endpoint of a cluster. EKS already supports a private endpoint for the Kubernetes API server, although it’s implemented in a different manner from PrivateLink (and we are aware of open feature request for the cluster private endpoint to be implemented as a standard PrivateLink endpoint).

mikestef9 avatar Dec 19 '22 21:12 mikestef9