aws-load-balancer-controller Add support for TCP_UDP to NLB TargetGroups and Listeners

Issue

https://github.com/kubernetes-sigs/aws-load-balancer-controller/issues/1608#issuecomment-937346660

Description

Previously, aws-load-balancer-controller ignored extra overlapping ServicePorts defined in the Kubernetes Service spec if the external port numbers were the same even if the protocols were different (e.g. TCP:53, UDP:53).

This behavior prevented users from exposing services that support TCP and UDP on the same external load balancer port number.

This patch solves the problem by detecting when a user defines multiple ServicePorts for the same external load balancer port number but using TCP and UDP protocols separately. In such situations, a TCP_UDP TargetGroup and Listener are created and SecurityGroup rules are updated accordingly. If more than two ServicePorts are defined, only the first two mergeable ServicePorts are used. Otherwise, the first ServicePort is used.

Checklist

[x] Added tests that cover your change (if possible)
[ ] Added/modified documentation as required (such as the README.md, or the docs directory)
[x] Manually tested
[x] Made sure the title of the PR is a good description that can go into the release notes

BONUS POINTS checklist: complete for good vibes and maybe prizes?! :exploding_head:

[x] Backfilled missing tests for code in same general area :tada:
[x] Refactored something and made the world a better place :star2:

Oct 07 '21 09:10 amorey

Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

:memo: Please follow instructions at https://git.k8s.io/community/CLA.md#the-contributor-license-agreement to sign the CLA.

It may take a couple minutes for the CLA signature to be fully registered; after that, please reply here with a new comment and we'll verify. Thanks.

If you've already signed a CLA, it's possible we don't have your GitHub username or you're using a different email address. Check your existing CLA data and verify that your email is set on your git commits.
If you signed the CLA as a corporation, please sign in with your organization's credentials at https://identity.linuxfoundation.org/projects/cncf to be authorized.
If you have done the above and are still having issues with the CLA being reported as unsigned, please log a ticket with the Linux Foundation Helpdesk: https://support.linuxfoundation.org/
Should you encounter any issues with the Linux Foundation Helpdesk, send a message to the backup e-mail support address at: [email protected]

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Oct 07 '21 09:10 k8s-ci-robot

Welcome @amorey!

It looks like this is your first PR to kubernetes-sigs/aws-load-balancer-controller 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/aws-load-balancer-controller has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. :smiley:

Oct 07 '21 09:10 k8s-ci-robot

Hi @amorey. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Oct 07 '21 09:10 k8s-ci-robot

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: amorey To complete the pull request process, please assign kishorj after the PR has been reviewed. You can assign the PR to them by writing /assign @kishorj in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

Oct 07 '21 09:10 k8s-ci-robot

@amorey Thanks for taking this on! I'm going to try to test it out 🎉

In the meantime any chance you could sign the CLA? 🙏🏼

Oct 07 '21 19:10 Yasumoto

@Yasumoto Just signed the CLA. Let me know if you have any feedback!

Oct 07 '21 20:10 amorey

Hm, it looks like the NLB was created successfully, but I think I'm running into https://github.com/aws/containers-roadmap/issues/512

I'm getting this error (slightly edited for brevity) when listening on port 514 on TCP+UDP (for syslog)

Error: UPGRADE FAILED: failed to create resource: Service "vector01" is invalid: spec.ports: Invalid value: []core.ServicePort{Name:"syslog-tcp", Protocol:"TCP", AppProtocol:(*string)(nil), Port:514, TargetPort:intstr.IntOrString{Type:0, IntVal:514, StrVal:""}, NodePort:0}, core.ServicePort{Name:"syslog-udp", Protocol:"UDP", AppProtocol:(*string)(nil), Port:514, TargetPort:intstr.IntOrString{Type:0, IntVal:514, StrVal:""}, NodePort:0}}: may not contain more than 1 protocol when type is 'LoadBalancer'

Which I think is from a gated validation check. I'm running on EKS, so not entirely sure if I can tweak the right knobs to turn that on, so suggestions appreciated! 🙇🏼

[edit]

And to be clear, this does look like it worked correctly! Curious how the Service object made it far enough to get processed by the Controller tho despite the validation failing, since I don't see a corresponding Service object in the cluster 🤔

❯ aws elbv2 --region=us-west-2 describe-listeners --load-balancer-arn arn:aws:elasticloadbalancing:us-west-2:891116894530:loadbalancer/net/vector01/e5af38562e88392a | jq .Listeners[1]
{
  "ListenerArn": "arn:aws:elasticloadbalancing:us-west-2:891116894530:listener/net/vector01/e5af38562e88392a/2be85133dc56b9c2",
  "LoadBalancerArn": "arn:aws:elasticloadbalancing:us-west-2:891116894530:loadbalancer/net/vector01/e5af38562e88392a",
  "Port": 514,
  "Protocol": "TCP_UDP",
  "DefaultActions": [
    {
      "Type": "forward",
      "TargetGroupArn": "arn:aws:elasticloadbalancing:us-west-2:891116894530:targetgroup/k8s-sw56prod-vector01-3f6a6cb913/7420c8dac5b4f50c",
      "Order": 1,
      "ForwardConfig": {
        "TargetGroups": [
          {
            "TargetGroupArn": "arn:aws:elasticloadbalancing:us-west-2:891116894530:targetgroup/k8s-sw56prod-vector01-3f6a6cb913/7420c8dac5b4f50c"
          }
        ]
      }
    }
  ]
}

Oct 08 '21 04:10 Yasumoto

Yep, looks like it's very new, and disabled by default in upstream 1.20/1.21. Just to confirm I'm trying to see what my EKS apiserver was started with, but seems reasonable this wouldn't be enabled until after this lands.

Oct 08 '21 05:10 Yasumoto

Ahh sorry, I forgot to mention that you have to enable the MixedProtocolLBService feature gate in order for everything to work. I'm using a self-managed cluster set up with kops and I have it enabled there but I haven't tried with EKS and don't know how to enable feature flags there.

What's the next step?

Oct 08 '21 07:10 amorey

The failure log came from an update. I assume you added the port during the update operation?

I'm wondering if the AWS Load Balancer Controller somehow saw the updated structure before the update passed validation, acted on it, and did not see the rollback, so did not remove/revert the NLB.

If the feature gate is disabled, we should never see such TCP_UDP NLBs, but unless something else happened, it looks like an underlying issue (not related to this code specifically) if AWS LB Controller sees and acts on updates that won't or didn't pass validation, and hence were never persisted.

A similar validation path is used for example if you remove all the ports from a Service, so it might be useful to confirm that if you do that in an update, AWS LB Controller sees that change, in which case it's a lower-level issue which should be pursued separately.

Apart from that weirdness, I'd say this PR needs to be merged before EKS enables that feature gate (if they agree to do so while it's Alpha), as the feature gate's job is to reject such Services before the supporting infrastructure is ready. It won't go Beta (i.e. enabled by default) until

All of the major clouds support this or indicate non-support properly

Kube-proxy does not proxy on ports that are in an error state

and without this PR, I don't see any code here (or in the Cloud Provider AWS version) that would reject a mixed-protocol LB, instead I think it would end up generating a bad request for an LB with both TCP and UDP listeners on the same port, which would be "improper" non-support.

That suggests the other thing that should be done before that feature gate goes Beta is adding code to handle and reject such Services to the Cloud Provider AWS implementation of LB support, ideally directing people to use this Controller instead for that and other neat features.

Oct 08 '21 09:10 TBBle

The committers are authorized under a signed CLA.

:white_check_mark: M00nF1sh (6669aa8237e5e1e19f73071257f3dd9d78f10d7b)
:white_check_mark: Andres Morey (c906136b1795649389266b6c257fc9d7b9e4049f)

Dec 02 '21 10:12 linux-foundation-easycla[bot]

Also, quick note, it looks like your rebase has ended up with an empty-commit duplicate of 9b3880fffc2680460436ff4bd462640a7bdc03b1 (see f78cf1e57e09a1965511a6ae0ab4c90c82f132ad)

Feb 01 '22 15:02 TBBle

@amorey: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Feb 09 '22 05:02 k8s-ci-robot

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

May 10 '22 05:05 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

Jun 09 '22 06:06 k8s-triage-robot

/remove-lifecycle rotten

Jun 14 '22 02:06 Yasumoto

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Sep 12 '22 03:09 k8s-triage-robot

/remove-lifecycle stale

Sep 12 '22 19:09 sidewinder12s

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Dec 11 '22 20:12 k8s-triage-robot

/remove-lifecycle stale

Dec 12 '22 20:12 z0rc

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Mark this PR as fresh with /remove-lifecycle stale
Close this PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Mar 12 '23 20:03 k8s-triage-robot

/remove-lifecycle stale

Mar 12 '23 21:03 z0rc

Some public attention to TCP_UDP support was directed to #1608 instead of here. This feature is still important, especially for the SIP protocol. It is helpful if SIP/UDP can switch to SIP/TCP when a message exceeds MTU, at the same load balancer IP address.

May 03 '23 19:05 celliso1

+1 for this feature. Some info for our use case:

Our product is using Serf cluster and gossip protocol, which needs to expose both TCP and UDP on same port, default 7946.
I spent a lot of effect to upgrade all our k8s clusters to get the mixed tcp and udp feature on LB type Service, then I realized that only allows you to mix TCP and UDP in same Service, but must be on different ports.
AWS Network Loadbalancer supports the TCP_UDP protocal type, but K8s ServicePort spec doesn't allow it yet.

Before this feature is landing, does any one find an easy workaround, please ?

May 08 '23 10:05 joegraviton

As a workaround, you might be able to use an externally-managed TCP_UDP LB and bind it to a non-LB Service, see https://kubernetes-sigs.github.io/aws-load-balancer-controller/v2.4/guide/use_cases/self_managed_lb/ and https://kubernetes-sigs.github.io/aws-load-balancer-controller/v2.4/guide/targetgroupbinding/targetgroupbinding/.

The same suggestion was made in https://github.com/kubernetes-sigs/aws-load-balancer-controller/issues/2759#issuecomment-1212366785 by someone who knows more about the AWS LB Controller, so it should work.

However, based on this PR's implementation, you might need to explicitly create two NetworkingPort in your TargetGroupBinding since you need one for TCP and one for UDP.

I think in terms of this PR, having NetworkingPort support TCP_UDP (and being the default) would be better since it would make the obvious case work by default, but that's probably a separate discussion (and either a separate PR, or a replacement/rework of this PR).

May 08 '23 14:05 TBBle

Any progress?

Jun 10 '23 20:06 artyom-p

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Mark this PR as fresh with /remove-lifecycle stale
Close this PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Jan 19 '24 23:01 k8s-triage-robot

/remove-lifecycle stale

Jan 20 '24 05:01 TBBle

any progress?

Feb 08 '24 11:02 rakesh-von

@amorey Any progress on this feature? I would believe many people were waiting for this feature release.

Mar 25 '24 06:03 tylern91

aws-load-balancer-controller aws-load-balancer-controller copied to clipboard

Add support for TCP_UDP to NLB TargetGroups and Listeners

Issue

Description

Checklist

BONUS POINTS checklist: complete for good vibes and maybe prizes?! :exploding_head:

aws-load-balancer-controller
aws-load-balancer-controller copied to clipboard