cloud-provider-aws
cloud-provider-aws copied to clipboard
Support multiple route tables
What would you like to be added:
I would like to be added support for multiple route tables.
Why is this needed:
This is needed in order to create an aws kubernetes cluster with multiple zones when the routing is done without an overlay. For each zone I have a separate route table and when I try to set the cluster up ccm returns the following error:
E0719 13:12:31.260618 1 route_controller.go:119] Couldn't reconcile node routes: error listing routes: found multiple matching AWS route tables for AWS cluster: shoot--core--aws-no-ov
After checking the code I saw that only one route table is supported by cloud controller manager. It would be great if support for multiple route tables could be added. The setup works on all other cloud providers but not on aws due to the missing support of this feature.
/kind feature
@DockToFuture: This issue is currently awaiting triage.
If cloud-provider-aws contributors determine this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.
The triage/accepted label can be added by org members by writing /triage accepted in a comment.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
Sounds reasonable at first glance, but could I get you to provide more information about your desired set-up so I can play around and understand the limitation? What CNI are you using? What are you using to set up the cluster?
We (@DockToFuture and me) are using Gardener to create the kubernetes clusters. It uses the Gardener extension provider AWS to setup the infrastructure. You can find the terraform template here. Essentially, there is a cluster-global route table per cluster, which includes the kubernetes node cidr as local and the internet gateway as default route, and per availability zone there is another route table, which includes the kubernetes nodes cidr as local and a nat gateway as default gateway. We use calico or cilium as CNIs, with calico being used in the majority of scenarios. As of now, we use overlay networks (IP-in-IP for calico and vxlan for cilium), but would like to get rid of them. Therefore, our goal is to use the cloud controller manager to create the corresponding routes for the per node pod cidrs in the infrastructure. It works for all other infrastructures we work on, e.g. Azure, GCP and OpenStack.
@DockToFuture pinging on this. Any way we can get a networking diagram for how Gardener requires underlying connectivity to look like?
Hello @jaypipes , You can see an example diagram for one of the multizonal clusters we create in the following picture.
Compared to the standard EKS Cluster a gardener cluster has the following setup:
- 3 subnets per zone. This is the equivalent to the public and private subnets(called
nodes_in gardener) for an EKS cluster, with the addition of one more for internal loadbalancing purposes (calledprivate_in gardener). - One route table (named main) for all the public subnets. Public subnets have a NATGW and a route to the IGW.
nodessubnets get their own route table with a route to their respective zones NAT GW. The setup so far is very similar to what we see EKS to be using.
The main differences are:
- we are not confined to use the AWS CNI only, but also support Calico and Cilium as @ScheererJ mentioned.
- the pod CIDR ranges we use are not necessarily part of the VPC range.
Because of these differences, we cannot rely on native VPC routing (e.g like the secondary IPs for instances used by EKS) and we need each of the route tables used by the nodes subnets to be extended with routes that forward the traffic targeting specific pod IP ranges to their assigned node.
From what we can see, any setup that aims to provide multi-zonal support in AWS needs to use multiple subnets and by extension these subnets will need their own route table. This is why we think that it is a good idea to extend the already existing capability of the AWS CCM to properly work for these multi-zone setups.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale - Mark this issue or PR as rotten with
/lifecycle rotten - Close this issue or PR with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle rotten - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Reopen this issue with
/reopen - Mark this issue as fresh with
/remove-lifecycle rotten - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
In response to this:
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied- After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied- After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closedYou can:
- Reopen this issue with
/reopen- Mark this issue as fresh with
/remove-lifecycle rotten- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.