containers-roadmap icon indicating copy to clipboard operation
containers-roadmap copied to clipboard

[EKS] [request]: Allow kube-api access with un-chained (single) Cilium CNI

Open PetrMc opened this issue 2 years ago • 3 comments

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Tell us about your request Provide a secure way to access kube-api while un-chained (single) Cillium CNI is used.

Which service(s) is this request for? EKS

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?

Quoting Cilium documentation:

Cilium can alternatively run in EKS using an overlay mode that gives pods non-VPC-routable IPs. This allows running more pods per Kubernetes worker node than the ENI limit, but means that pod connectivity to resources outside the cluster (e.g., VMs in the VPC or AWS managed services) is masqueraded (i.e., SNAT) by Cilium to use the VPC IP address of the Kubernetes worker node.

While running any applications that requires Admission webhooks, kube-api is only available on hostNetwork, which is a difficult solution to overcome. The details are explained here

What outcome are you trying to achieve, ultimately, and why is it hard/impossible to do right now? What is the impact of not having this problem solved? The more details you can provide, the better we'll be able to understand and solve the problem.

Currently we're facing the issue when istio-proxy is not able to connect to istiod admission webhook. The situation prevents microservice to start and be functional. The complete steps sequence is attached here for reference. cilium_eks_istio_sidecar_injection_issue.md

Are you currently working around this issue? How are you currently solving this problem?

Per the issue that is also quoted above. Neither solution (hostNetwork:true, chaining VPC CNI and exposing webhook via LB) are not ideal and require significant product adjustments, security exceptions or sacrifice of custom IPAM functionality.

Additional context GCP and Azure overcome such problem by introducing Konnectivity. The similar approach from AWS (with or without Konnectivity) would help the community to run Cilium CNI on AWS EKS without any security or technical trade-offs.

we are also reviewing the possibility to extend IP Address range per this blog suggested by @shapirov103 And also analyzing any other use-cases that are not resolvable today with the CNI-chained approach.

Attachments If you think you might have additional information that you'd like to include via an attachment, please do - we'll take a look. (Remember to remove any personally-identifiable information.)

PetrMc avatar Dec 04 '23 22:12 PetrMc

I am not 100% sure but I think GCP does this by --enable-aggregator-routing not Konnectivity (which is used for other proxying, but not webhooks). The same might not work on other platforms, though.

howardjohn avatar Jan 12 '24 20:01 howardjohn

Can we make this broader, to request general compatibility with "bring your own CNI"? Calico also offers a private CNI network (and so do other CNIs); would be good if Konnectivity or similar was deployed to make them all work.

fasaxc avatar Jul 22 '24 16:07 fasaxc

A load balancer endpoint as part of cluster bootstrapping that is reachable from worker node via service end point might also solve issues for tools which needs webhook (validating/mutating) to communicate with the control plane. A node port proxy (or supporting SNI and HTTP/2 tunnels) sitting on the control plane exposed via a load balancer for each control plane. Konnectivity is certainly an option

The only way out at this point of time is to provision tools wanting to use this functionality, has to be deployed with hostNetwork: true or use cloud native CNI's. This approach has other problems like hard coding the ports and ensuring it doesn't clash with other existing and future tools/apps to be deployed.

calshankar avatar Sep 04 '24 12:09 calshankar

I made a POC setup combining an AWS gateway load balancer and a "geneve gateway POD" that allows EKS to reach the pods CIDR. So it's a DIY Konnectivity agent alternative that operates at layer 3.

My use case is that there are many benefits to use an overlay network when we have EKS hybrid nodes but the same can be used with EKS on EC2 alone. See https://medium.com/@the.jfnadeau/eks-cilium-as-the-only-cni-driver-with-simplified-hybrid-nodes-and-admission-webhooks-routing-1f351d11f9dd

On the otherhand, with the Cilium agent being already GENEVE enabled, Cilium might be able to take that part of the setup. Adding this as a CFP on the Cilium side.

somejfn avatar Feb 20 '25 14:02 somejfn