aws-load-balancer-controller icon indicating copy to clipboard operation
aws-load-balancer-controller copied to clipboard

Add support for TCP_UDP to NLB TargetGroups and Listeners

Open amorey opened this issue 4 years ago • 36 comments

Issue

https://github.com/kubernetes-sigs/aws-load-balancer-controller/issues/1608#issuecomment-937346660

Description

Previously, aws-load-balancer-controller ignored extra overlapping ServicePorts defined in the Kubernetes Service spec if the external port numbers were the same even if the protocols were different (e.g. TCP:53, UDP:53).

This behavior prevented users from exposing services that support TCP and UDP on the same external load balancer port number.

This patch solves the problem by detecting when a user defines multiple ServicePorts for the same external load balancer port number but using TCP and UDP protocols separately. In such situations, a TCP_UDP TargetGroup and Listener are created and SecurityGroup rules are updated accordingly. If more than two ServicePorts are defined, only the first two mergeable ServicePorts are used. Otherwise, the first ServicePort is used.

Checklist

  • [x] Added tests that cover your change (if possible)
  • [ ] Added/modified documentation as required (such as the README.md, or the docs directory)
  • [x] Manually tested
  • [x] Made sure the title of the PR is a good description that can go into the release notes

BONUS POINTS checklist: complete for good vibes and maybe prizes?! :exploding_head:

  • [x] Backfilled missing tests for code in same general area :tada:
  • [x] Refactored something and made the world a better place :star2:

amorey avatar Oct 07 '21 09:10 amorey

Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

:memo: Please follow instructions at https://git.k8s.io/community/CLA.md#the-contributor-license-agreement to sign the CLA.

It may take a couple minutes for the CLA signature to be fully registered; after that, please reply here with a new comment and we'll verify. Thanks.


  • If you've already signed a CLA, it's possible we don't have your GitHub username or you're using a different email address. Check your existing CLA data and verify that your email is set on your git commits.
  • If you signed the CLA as a corporation, please sign in with your organization's credentials at https://identity.linuxfoundation.org/projects/cncf to be authorized.
  • If you have done the above and are still having issues with the CLA being reported as unsigned, please log a ticket with the Linux Foundation Helpdesk: https://support.linuxfoundation.org/
  • Should you encounter any issues with the Linux Foundation Helpdesk, send a message to the backup e-mail support address at: [email protected]

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

k8s-ci-robot avatar Oct 07 '21 09:10 k8s-ci-robot

Welcome @amorey!

It looks like this is your first PR to kubernetes-sigs/aws-load-balancer-controller 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/aws-load-balancer-controller has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. :smiley:

k8s-ci-robot avatar Oct 07 '21 09:10 k8s-ci-robot

Hi @amorey. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Oct 07 '21 09:10 k8s-ci-robot

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: amorey To complete the pull request process, please assign kishorj after the PR has been reviewed. You can assign the PR to them by writing /assign @kishorj in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot avatar Oct 07 '21 09:10 k8s-ci-robot

@amorey Thanks for taking this on! I'm going to try to test it out 🎉

In the meantime any chance you could sign the CLA? 🙏🏼

Yasumoto avatar Oct 07 '21 19:10 Yasumoto

@Yasumoto Just signed the CLA. Let me know if you have any feedback!

amorey avatar Oct 07 '21 20:10 amorey

Hm, it looks like the NLB was created successfully, but I think I'm running into https://github.com/aws/containers-roadmap/issues/512

I'm getting this error (slightly edited for brevity) when listening on port 514 on TCP+UDP (for syslog)

Error: UPGRADE FAILED: failed to create resource: Service "vector01" is invalid: spec.ports: Invalid value: []core.ServicePort{Name:"syslog-tcp", Protocol:"TCP", AppProtocol:(*string)(nil), Port:514, TargetPort:intstr.IntOrString{Type:0, IntVal:514, StrVal:""}, NodePort:0}, core.ServicePort{Name:"syslog-udp", Protocol:"UDP", AppProtocol:(*string)(nil), Port:514, TargetPort:intstr.IntOrString{Type:0, IntVal:514, StrVal:""}, NodePort:0}}: may not contain more than 1 protocol when type is 'LoadBalancer'

Which I think is from a gated validation check. I'm running on EKS, so not entirely sure if I can tweak the right knobs to turn that on, so suggestions appreciated! 🙇🏼

[edit]

And to be clear, this does look like it worked correctly! Curious how the Service object made it far enough to get processed by the Controller tho despite the validation failing, since I don't see a corresponding Service object in the cluster 🤔

❯ aws elbv2 --region=us-west-2 describe-listeners --load-balancer-arn arn:aws:elasticloadbalancing:us-west-2:891116894530:loadbalancer/net/vector01/e5af38562e88392a | jq .Listeners[1]
{
  "ListenerArn": "arn:aws:elasticloadbalancing:us-west-2:891116894530:listener/net/vector01/e5af38562e88392a/2be85133dc56b9c2",
  "LoadBalancerArn": "arn:aws:elasticloadbalancing:us-west-2:891116894530:loadbalancer/net/vector01/e5af38562e88392a",
  "Port": 514,
  "Protocol": "TCP_UDP",
  "DefaultActions": [
    {
      "Type": "forward",
      "TargetGroupArn": "arn:aws:elasticloadbalancing:us-west-2:891116894530:targetgroup/k8s-sw56prod-vector01-3f6a6cb913/7420c8dac5b4f50c",
      "Order": 1,
      "ForwardConfig": {
        "TargetGroups": [
          {
            "TargetGroupArn": "arn:aws:elasticloadbalancing:us-west-2:891116894530:targetgroup/k8s-sw56prod-vector01-3f6a6cb913/7420c8dac5b4f50c"
          }
        ]
      }
    }
  ]
}

Yasumoto avatar Oct 08 '21 04:10 Yasumoto

Yep, looks like it's very new, and disabled by default in upstream 1.20/1.21. Just to confirm I'm trying to see what my EKS apiserver was started with, but seems reasonable this wouldn't be enabled until after this lands.

Yasumoto avatar Oct 08 '21 05:10 Yasumoto

Ahh sorry, I forgot to mention that you have to enable the MixedProtocolLBService feature gate in order for everything to work. I'm using a self-managed cluster set up with kops and I have it enabled there but I haven't tried with EKS and don't know how to enable feature flags there.

What's the next step?

amorey avatar Oct 08 '21 07:10 amorey

The failure log came from an update. I assume you added the port during the update operation?

I'm wondering if the AWS Load Balancer Controller somehow saw the updated structure before the update passed validation, acted on it, and did not see the rollback, so did not remove/revert the NLB.

If the feature gate is disabled, we should never see such TCP_UDP NLBs, but unless something else happened, it looks like an underlying issue (not related to this code specifically) if AWS LB Controller sees and acts on updates that won't or didn't pass validation, and hence were never persisted.

A similar validation path is used for example if you remove all the ports from a Service, so it might be useful to confirm that if you do that in an update, AWS LB Controller sees that change, in which case it's a lower-level issue which should be pursued separately.


Apart from that weirdness, I'd say this PR needs to be merged before EKS enables that feature gate (if they agree to do so while it's Alpha), as the feature gate's job is to reject such Services before the supporting infrastructure is ready. It won't go Beta (i.e. enabled by default) until

  • All of the major clouds support this or indicate non-support properly
  • Kube-proxy does not proxy on ports that are in an error state

and without this PR, I don't see any code here (or in the Cloud Provider AWS version) that would reject a mixed-protocol LB, instead I think it would end up generating a bad request for an LB with both TCP and UDP listeners on the same port, which would be "improper" non-support.

That suggests the other thing that should be done before that feature gate goes Beta is adding code to handle and reject such Services to the Cloud Provider AWS implementation of LB support, ideally directing people to use this Controller instead for that and other neat features.

TBBle avatar Oct 08 '21 09:10 TBBle

CLA Signed

The committers are authorized under a signed CLA.

  • :white_check_mark: M00nF1sh (6669aa8237e5e1e19f73071257f3dd9d78f10d7b)
  • :white_check_mark: Andres Morey (c906136b1795649389266b6c257fc9d7b9e4049f)

Also, quick note, it looks like your rebase has ended up with an empty-commit duplicate of 9b3880fffc2680460436ff4bd462640a7bdc03b1 (see f78cf1e57e09a1965511a6ae0ab4c90c82f132ad)

TBBle avatar Feb 01 '22 15:02 TBBle

@amorey: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Feb 09 '22 05:02 k8s-ci-robot

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar May 10 '22 05:05 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Jun 09 '22 06:06 k8s-triage-robot

/remove-lifecycle rotten

Yasumoto avatar Jun 14 '22 02:06 Yasumoto

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Sep 12 '22 03:09 k8s-triage-robot

/remove-lifecycle stale

sidewinder12s avatar Sep 12 '22 19:09 sidewinder12s

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Dec 11 '22 20:12 k8s-triage-robot

/remove-lifecycle stale

z0rc avatar Dec 12 '22 20:12 z0rc

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle stale
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Mar 12 '23 20:03 k8s-triage-robot

/remove-lifecycle stale

z0rc avatar Mar 12 '23 21:03 z0rc

Some public attention to TCP_UDP support was directed to #1608 instead of here. This feature is still important, especially for the SIP protocol. It is helpful if SIP/UDP can switch to SIP/TCP when a message exceeds MTU, at the same load balancer IP address.

celliso1 avatar May 03 '23 19:05 celliso1

+1 for this feature. Some info for our use case:

Before this feature is landing, does any one find an easy workaround, please ?

joegraviton avatar May 08 '23 10:05 joegraviton

As a workaround, you might be able to use an externally-managed TCP_UDP LB and bind it to a non-LB Service, see https://kubernetes-sigs.github.io/aws-load-balancer-controller/v2.4/guide/use_cases/self_managed_lb/ and https://kubernetes-sigs.github.io/aws-load-balancer-controller/v2.4/guide/targetgroupbinding/targetgroupbinding/.

The same suggestion was made in https://github.com/kubernetes-sigs/aws-load-balancer-controller/issues/2759#issuecomment-1212366785 by someone who knows more about the AWS LB Controller, so it should work.

However, based on this PR's implementation, you might need to explicitly create two NetworkingPort in your TargetGroupBinding since you need one for TCP and one for UDP.

I think in terms of this PR, having NetworkingPort support TCP_UDP (and being the default) would be better since it would make the obvious case work by default, but that's probably a separate discussion (and either a separate PR, or a replacement/rework of this PR).

TBBle avatar May 08 '23 14:05 TBBle

Any progress?

artyom-p avatar Jun 10 '23 20:06 artyom-p

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle stale
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Jan 19 '24 23:01 k8s-triage-robot

/remove-lifecycle stale

TBBle avatar Jan 20 '24 05:01 TBBle

any progress?

rakesh-von avatar Feb 08 '24 11:02 rakesh-von

@amorey Any progress on this feature? I would believe many people were waiting for this feature release.

tylern91 avatar Mar 25 '24 06:03 tylern91