amazon-vpc-cni-k8s
amazon-vpc-cni-k8s copied to clipboard
Subnet discovery enhancements to skip node primary ENI and support secondary ENI security groups
What would you like to be added:
Enhanced subnet discovery (launch blog) provides an improved UX compared to "custom networking", but doesn't yet support all of the capabilities of custom networking, notably, ability to run pods in separate subnets from nodes, and attach separate security group(s) to the pods running on secondary ENIs in alternate subnets.
To run pods only in secondary subnets and not primary node subnet:
Support tag key/value of kubernetes.io/role/cni=0 which would instruct VPC CNI to skip using that subnet for pods. Users could tag node subnets with this key/value pair. In this case, the value of the tag key (1 vs 0) now matters with enhanced subnet discovery.
To apply alternate security groups to pods running in secondary subnets:
Support tag key/value of kubernetes.io/role/cni=1 on security groups in the VPC. The VPC CNI would discover these security groups at startup, and these would only be applied to ENIs launched in secondary subnets discovered using the subnet discovery feature.
Why is this needed:
To support the use case of pods and nodes running in separate subnets that is possible with custom networking, but with the improved UX of subnet discovery.
Its really important to have this frature.
- Currently we have limited IPS i vpc around 1500-2000 and this isnt a good way to add new CIDR's everytime we start reaching exhaustion of IP's, and it also need a lot of process and approval to include new CIDR, because that include lots of network changes.
- The Custom Networking approach if implemented as stated above will be better option to proceed further.
- kubernetes.io/role/cni=0 tag on subnets for routable and nodes in a cluster
- kubernetes.io/role/cni=1 tag on subnets for the pods in a cluster and this subnets can be non-routable and present withing every vpc.
- kubernetes.io/role/cni=1 & additional tag to determine which cluster pods or kubernetes.io/role/cni={cluster-name} tag on security groups to be assigned to pods based from specific clusters
- how will be the scenario when custom networking is implemented and we introduced new security group, will that update the sg for pod inplace or instance refresh will be done by aws karpenter for such changes.
Thanks for the feedback
kubernetes.io/role/cni=1 & additional tag to determine which cluster pods or kubernetes.io/role/cni={cluster-name} tag on security groups to be assigned to pods based from specific clusters
Do you need this because you run multiple clusters per VPC? How important is this feature. Would a single cluster name tag be enough? Meaning would have you specific pod subnets designated for a specific cluster. Vs needing to mix and match within a VPC.
how will be the scenario when custom networking is implemented and we introduced new security group, will that update the sg for pod inplace or instance refresh will be done by aws karpenter for such changes.
The discovery of the subnets happens, only if we need to create an allocate a ENI, that will be when the ip address is exhaustion, or more IPs are requested for warm pool. If a new subnet is tagged while node is running, that would be discovered next time VPC CNI needs to allocate an ENI.
Do you need this because you run multiple clusters per VPC? How important is this feature. Would a single cluster name tag be enough? Meaning would have you specific pod subnets designated for a specific cluster. Vs needing to mix and match within a VPC.
I suppose we should be good at this moment even without cluster tag, all non-routable for every cluster in vpc can use same non-routable subnets. we can go with mix-match within vpc.
The discovery of the subnets happens, only if we need to create an allocate a ENI, that will be when the ip address is exhaustion, or more IPs are requested for warm pool. If a new subnet is tagged while node is running, that would be discovered next time VPC CNI needs to allocate an ENI.
We need this functionality on update to SG either through auto-instance refresh else new rules added through additional security group will not be picked up by the pods
What would you like to be added:
Enhanced subnet discovery (launch blog) provides an improved UX compared to "custom networking", but doesn't yet support all of the capabilities of custom networking, notably, ability to run pods in separate subnets from nodes, and attach separate security group(s) to the pods running on secondary ENIs in alternate subnets.
To run pods only in secondary subnets and not primary node subnet: Support tag key/value of
kubernetes.io/role/cni=0which would instruct VPC CNI to skip using that subnet for pods. Users could tag node subnets with this key/value pair. In this case, the value of the tag key (1 vs 0) now matters with enhanced subnet discovery.To apply alternate security groups to pods running in secondary subnets: Support tag key/value of
kubernetes.io/role/cni=1on security groups in the VPC. The VPC CNI would discover these security groups at startup, and these would only be applied to ENIs launched in secondary subnets discovered using the subnet discovery feature.Why is this needed:
To support the use case of pods and nodes running in separate subnets that is possible with custom networking, but with the improved UX of subnet discovery.
@mikestef9 This proposed solution will work for us. This is a crucial issue that needs to be addressed urgently. We need the ability to provision smaller VPCs and expand them over time. Creating large CIDR VPCs can lead to wasted IP addresses.
Happy to hear your comment on the implementation in #3121 over tagging the subnet for skipping the primary.
This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days
Any movement here?
This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days
Any update?
I'd be willing to work on this instead of #3229 -- it looks like someone already tried with #3121 but there was no feedback on the PR and it was closed as stale.
Would you be willing to accept upstream contributions on this feature?
Any other updates on this? :)
Another question: how would this implementation support running pods a subnet different from other pods and different from the nodes?
With custom networking, it is possible to assign pods to a completely different subnet. This is useful to allow for overlapping IP ranges in situations where you have many clusters with limited amounts of east-west traffic between clusters. Pods requiring cross-cluster communication can be put into a cross-cluster subnet, which can be connected to other clusters by peering / VPN / etc.
Similarly, if you wanted to have pods with unique static egress IPs, you could put those pods into a static egress subnet and configure the route table in that subnet to forward traffic from a pod to a specific NAT gateway.
Custom networking allows many things that I'm not sure are covered by this suggested implementation. Unless you're suggesting we make it so the features are not mutually exclusive.
For example, cases like this would need to be refactored: https://github.com/aws/amazon-vpc-cni-k8s/blob/master/pkg/awsutils/awsutils.go#L939-L948
... where the code explicitly chooses to use not to use subnet discovery at all if custom networking is enabled.
I was thinking that #3229 (supporting subnet discovery inside custom networking) would be a nice alternative to the implementation you're proposing here, as it would solve the challenges above while also making custom networking nicer to use. However, if custom networking can be made to work with subnet discovery and subnets can be tagged in certain ways so they'll be excluded from discovery, combined with the existing custom networking this should be sufficient to achieve the abovementioned challenges.
@terraboops Apologies for missing the message. We are prioritizing this issue and have some design discussions on the suggested implementation
Similarly, if you wanted to have pods with unique static egress IPs, you could put those pods into a static egress subnet and configure the route table in that subnet to forward traffic from a pod to a specific NAT gateway.
This is currently not being considered in the feature
This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days