containers-roadmap icon indicating copy to clipboard operation
containers-roadmap copied to clipboard

[EKS] [IPv6 on instance TGs]: Add IPv6 support to instance-type target groups for EKS support

Open eshicks4 opened this issue 2 years ago • 41 comments

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Tell us about your request Please add IPv6 support to instance-type target groups so that we can use EKS cluster autoscaling groups with ALBs/NLBs.

Which service(s) is this request for? EKS, ELBs

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard? EKS creates an autoscaling group onto which we can attach target groups; however, the new IPv6-based clusters don't bind NodePorts to the EC2 nodes' IPv4 IPs. We have dual-stack ELBs and IPv6-enabled EKS clusters but seem to be missing that connecting piece in-between.

Are you currently working around this issue? We aren't. We're currently stuck with IPv4 clusters.

Additional context An alternative could be to make EKS clusters dual-stack so we can use the ipFamilies & ipFamilyPolicy features. IPv6-only would be the default to avoid IP exhaustion but we could selectively bind IPv4 IPs as-needed.

Attachments N/A

eshicks4 avatar Feb 15 '22 15:02 eshicks4

See also case #9628572941

eshicks4 avatar Feb 15 '22 15:02 eshicks4

Hey @eshicks4 this first needs to implemented by ALB and NLB. As called out here, ALB and NLB only support IP targeting mode for IPv6. Once they support instance mode, we can add support in the AWS Load Balancer Controller.

Any reason you can't use IP targeting mode?

mikestef9 avatar Feb 15 '22 16:02 mikestef9

My understanding was that IP targeting mode required that you knew the IPs you would be targeting. Since the cluster is autoscaled and my service's target is running on all nodes as a daemonset, couldn't that list change?

eshicks4 avatar Feb 15 '22 16:02 eshicks4

That's the job of the AWS LB Controller. It watches service endpoints in the cluster and auto updates ALB/NLB target groups with the latest list of pod IP addresses.

mikestef9 avatar Feb 15 '22 19:02 mikestef9

Ok I think I have this design working - at least on my existing IPv4 setup. I'll need to rebuild to try it on an IPv6-only cluster.

One question, though, as this process involved quite a bit more than my original design. What kinds of benefits does it provide over a more simple one that just uses an instance-based TG (with IPv6 support) to connect an NLB to the cluster's autoscaling group?

Thanks

eshicks4 avatar Feb 15 '22 21:02 eshicks4

Instance mode load balancer can potentially go through an additional instance hop before the traffic gets to the pod. This adds a higher latency as compared to the case where load balancer can send the traffic directly to the pods. This is possible because VPC CNI directly uses VPC IP addresses, so ALB/NLB can send traffic directly to pods and skip node ports and kube-proxy.

mikestef9 avatar Feb 16 '22 16:02 mikestef9

That makes sense. This is sounding more like an ELB feature request. Should I switch it over to their queue instead?

eshicks4 avatar Feb 16 '22 17:02 eshicks4

Instance mode load balancer can potentially go through an additional instance hop before the traffic gets to the pod. This adds a higher latency as compared to the case where load balancer can send the traffic directly to the pods. This is possible because VPC CNI directly used VPC IP addresses, so ALB/NLB can send traffic directly to pods and skip node ports and kube-proxy.

@mikestef9 don't forget that with instance mode you need the extra complexity of external traffic policy set to local if you have compliance policies for public traffic.

@eshicks4 assuming you're using the AWS Load Balancer Controller (which you should be as the the in tree controller is deprecated) either ALB IP backed ingress or NLB IP backed ingress controller services are the simplest solution.

stevehipwell avatar Feb 18 '22 15:02 stevehipwell

@stevehipwell while that may be the simplest solution that currently works for IPv6, I'm not sure I'd call it the simplest overall. Envoy runs as a daemonset in Project Contour's design so it's going to route to all available nodes anyway. An NLB configured to route to a static NodePort and auto-updated by the autoscaler doesn't require an ELB controller deployment or any of the IAM role setup that goes along with it. It works perfectly with IPv4 so, once IPv6 support is added to instance-based target groups, the only real benefit the controller has for us is the direct IP routing that bypasses kube-proxy.

eshicks4 avatar Feb 24 '22 15:02 eshicks4

@eshicks4 I'd suggest that you could switch Contour to use deployments for Envoy and nlb-ip service annotations; this will allow you to use IPv6 and have a HA ingress (see pod readiness gates). I'm sure there are some limited cases where the daemonset and instance mode is better but I can't think of many cases where the pros outweigh the cons? Obviously you might have some of these so this is just a friendly suggestion.

stevehipwell avatar Feb 24 '22 16:02 stevehipwell

@stevehipwell Just the reduction in complexity really. (fewer moving parts to break, etc.) There may be other reasons but, in our case, Kubernetes is still pretty new and we just have more people familiar with AWS. That said, I have it all working & documented with the in-cluster ELB controller and no real reason to switch back since there are benefits to using it. That's why I'm thinking it's best to move this over to the ELB team's feature request bucket instead.

eshicks4 avatar Feb 25 '22 18:02 eshicks4

Helllo,

We’re more or less in the same situation as @eshicks4.

We’d like to attach our ingress nodes autoscaling group to our load balancer target group. The reason why we’d like to do so is the same: reduction of the complexity (no need to deploy the load balancer controller, one less piece which could break, etc).

Another reason is that, currently, the load balancer controller doesn’t handle the case where two clusters are behind the same target group, which is how we do some blue/green upgrades.

Also, to avoid the additional hop when using instance type target groups and node ports, we deploy our ingress controllers using host ports.

yann-soubeyrand avatar Aug 22 '22 08:08 yann-soubeyrand

I've been struggling to get the suggested alternative (ip based) solution working with ipv6. Are there guidelines anywhere for troubleshooting this? Logs from aws-load-balancer-controller seem okay without errors, The load balancer and target group are created but targets always stay unhealthy. Routing table has ipv6 routes, NACL are fully open for ipv4/v6, security groups wide open to test (in addition to rules created by the controller). I have latest EKS / CNI / controller versions. I have tried ALB / NLB and both have same result and have attempted to hit the service ipv6 endpoints directly as well from the EKS nodes and get refused. Service works perfect with a kubectl port-forward.

service description

Name:                     <name>
Namespace:                <namespace>
Labels:                   app.kubernetes.io/instance=<namespace>
Annotations:              service.beta.kubernetes.io/aws-load-balancer-ip-address-type: dualstack
                          service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip
                          service.beta.kubernetes.io/aws-load-balancer-subnets: subnet-<id>, subnet-<id>
Selector:                 <selector>
Type:                     LoadBalancer
IP Family Policy:         SingleStack
IP Families:              IPv6
IP:                       <address>:e8fb::5a07
IPs:                      <address>:e8fb::5a07
LoadBalancer Ingress:     <redacted>.elb.us-east-1.amazonaws.com
Port:                     <unset>  80/TCP
TargetPort:               5000/TCP
NodePort:                 <unset>  30538/TCP
Endpoints:                [<address>:bf0::1a]:5000,[<address>:bf0::1c]:5000
Session Affinity:         None
External Traffic Policy:  Cluster
Events:
  Type    Reason                  Age                From     Message
  ----    ------                  ----               ----     -------
  Normal  SuccessfullyReconciled  26m (x3 over 72m)  service  Successfully reconciled

edit: Turns out this was an issue with my app and what it was listening to. Traffic routed by kube-proxy and port-forward works fine when bound to localhost, when coming directly to the pod using the IP method it does not which should probably be obvious. I'll leave this here in case anyone else has the same issue and have server binding to https://localhost:port instead of https://[::]:port

plaisted avatar Oct 10 '22 23:10 plaisted

I'll leave this here in case anyone else has the same issue and have server binding to https://localhost:port instead of https://[::]:port

I ran into this a few times too. The pods run IPv6-only so there is no 127.0.0.1 or 0.0.0.0 to bind to. Sometimes localhost works (depends on the container's /etc/hosts file) but it's generally been safer or even necessary to override app defaults and force it to bind listeners to [::1] or [::] instead.

eshicks4 avatar Nov 07 '22 21:11 eshicks4

I really need IPv6 ALB with IPv6 instance target groups for IPv6 native subnets. This feature being discussed requires that to be implemented first.

Is there a better place to express interest in this feature outside 'container' roadmap?

xanather avatar Dec 29 '22 04:12 xanather

Any update on this feature? :)

xanather avatar Apr 11 '23 01:04 xanather

o/ Waving from the void on this :)

NeilHanlon avatar Jun 19 '23 17:06 NeilHanlon

This is still dependent on ALB/NLB first adding service for instance IPv6 target groups (which is coming later this year). When that happens, we can add support in the controller.

mikestef9 avatar Jun 20 '23 15:06 mikestef9

This feature has been implemented and is now available on AWS.

nakrule avatar Sep 29 '23 06:09 nakrule

This feature has been implemented and is now available on AWS.

My ASG refuses to add worker nodes to the target group. Did you get this working? Please provide a link to the merged PR that backs your claim. Otherwise you are being misleading..

matthenry87 avatar Oct 01 '23 00:10 matthenry87

With the recent launch of support in ELB to register instances using IPv6 address([1]), you can use AWS load balancer (LB) controller to create ALB/NLB in instance type for IPv6. We recommend using AWS LB controller v2.5.1+ to get started. [1] https://aws.amazon.com/about-aws/whats-new/2023/10/application-load-balancer-network-load-balancer-registering-instances-ipv6-targets/

sjastis avatar Oct 03 '23 00:10 sjastis

With the recent launch of support in ELB to register instances using IPv6 address([1]), you can use AWS load balancer (LB) controller to create ALB/NLB in instance type for IPv6. We recommend using AWS LB controller v2.5.1+ to get started. [1] https://aws.amazon.com/about-aws/whats-new/2023/10/application-load-balancer-network-load-balancer-registering-instances-ipv6-targets/

That doesn't really meet my use case. We use a single ingress controller of type NodePort, and 1 internal and external NLB each.

I just wrote a Lambda to enable the primary IPv6 IP as each instance come up. Ideally the node group would have that as a feature that can be turned on, as it isn't something that can be specified in a custom launch template due to not having the CIDR yet.

matthenry87 avatar Oct 03 '23 02:10 matthenry87

With the recent launch of support in ELB to register instances using IPv6 address([1]), you can use AWS load balancer (LB) controller to create ALB/NLB in instance type for IPv6. We recommend using AWS LB controller v2.5.1+ to get started. [1] https://aws.amazon.com/about-aws/whats-new/2023/10/application-load-balancer-network-load-balancer-registering-instances-ipv6-targets/

@sjastis Otherwise the instances are never added to the target group due to not having a primary IPv6 IP.

matthenry87 avatar Oct 03 '23 02:10 matthenry87

I confirm that there’s a hole in the racket here: EKS nodes cannot be registered to these new IPv6 instance type target groups due to missing primary IPv6 address. I tried setting PrimaryIpv6 to true in the NetworkInterfaces section of my custom launch template, but it gets somehow lost in translation when EKS creates its own launch template from mine. Is it planned to fix this? Should I open a new issue to track this @mikestef9?

yann-soubeyrand avatar Oct 03 '23 13:10 yann-soubeyrand

I confirm that there’s a hole in the racket here: EKS nodes cannot be registered to these new IPv6 instance type target groups due to missing primary IPv6 address. I tried setting PrimaryIpv6 to true in the NetworkInterfaces section of my custom launch template, but it gets somehow lost in translation when EKS creates its own launch template from mine. Is it planned to fix this? Should I open a new issue to track this @mikestef9?

Can't really set it on the launch template because it would require that the node IPv6 CIDR is already known, as it uses the first one in the block.

matthenry87 avatar Oct 03 '23 13:10 matthenry87

I’m not sure to understand: in the launch template I was able to set Ipv6AddressCount to 1 (which, if I understand correctly, enables automatic IP address allocation) and PrimaryIpv6 to true (which, again, if I understand correctly, should make the firstly allocated address the primary), but I wasn’t able to test due to the limitation I described. Could you elaborate on why do you think it couldn’t work at all?

yann-soubeyrand avatar Oct 03 '23 13:10 yann-soubeyrand

I’m not sure to understand: in the launch template I was able to set Ipv6AddressCount to 1 (which, if I understand correctly, enables automatic IP address allocation) and PrimaryIpv6 to true (which, again, if I understand correctly, should make the firstly allocated address the primary), but I wasn’t able to test due to the limitation I described. Could you elaborate on why do you think it couldn’t work at all?

The CIDR ranges are automatically assigned to your nodes - they are not known in advance. When I tried to go in and manually create a launch template - w/ the primary IPv6 IP option set to true, it wouldn't not let me unless I specified the IPv6 CIDR in advance.

matthenry87 avatar Oct 03 '23 14:10 matthenry87

@matthenry87 I tried to modify the launch template generated by EKS to set PrimaryIpv6 to true and then modified the autoscaling group generated by EKS to use my new launch template version, and I was able to start nodes with a primary IPv6 address and they correctly attached to the target group I set on the autoscaling group.

EDIT: I also had to set Ipv6AddressCount to 1 in the launch template.

yann-soubeyrand avatar Oct 03 '23 16:10 yann-soubeyrand

@yann-soubeyrand, thanks for confirming and glad it works for you. @matthenry87, in order for the ALB/NLB instance target type to work in IPv6, the EC2 instance needs to have primary IPv6 address since the traffic is routed to instances using the primary private IP address specified in the primary network interface for the instance[1][2]. You can either assign it during launch, or from console. Please refer to the below docs[3][4] to assign the IPv6 primary address to your instance Refs:

  1. https://docs.aws.amazon.com/elasticloadbalancing/latest/application/load-balancer-target-groups.html#target-type
  2. https://docs.aws.amazon.com/elasticloadbalancing/latest/network/load-balancer-target-groups.html#target-type
  3. https://docs.aws.amazon.com/vpc/latest/userguide/vpc-migrate-ipv6.html#vpc-migrate-assign-ipv6-address
  4. https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-instance-addressing.html#ipv6-addressing

oliviassss avatar Oct 03 '23 17:10 oliviassss

@oliviassss don’t get me wrong, what I did is a hacky test, I think we must never touch the launch template generated by EKS. However, the path to a fully working solution doesn’t seem so hard at first sight: either EKS should set PrimaryIpv6 to true in its generated launch template when the cluster is an IPv6 one, or EKS should keep the PrimaryIpv6 value set by the user in its custom launch template. The latter solution does put some burden on the user, though.

yann-soubeyrand avatar Oct 03 '23 17:10 yann-soubeyrand