aws-load-balancer-controller
aws-load-balancer-controller copied to clipboard
Add support for TCP_UDP to NLB TargetGroups and Listeners
Issue
https://github.com/kubernetes-sigs/aws-load-balancer-controller/issues/1608#issuecomment-937346660
Description
Previously, aws-load-balancer-controller ignored extra overlapping ServicePorts defined in the Kubernetes Service spec if the external port numbers were the same even if the protocols were different (e.g. TCP:53, UDP:53).
This behavior prevented users from exposing services that support TCP and UDP on the same external load balancer port number.
This patch solves the problem by detecting when a user defines multiple ServicePorts for the same external load balancer port number but using TCP and UDP protocols separately. In such situations, a TCP_UDP TargetGroup and Listener are created and SecurityGroup rules are updated accordingly. If more than two ServicePorts are defined, only the first two mergeable ServicePorts are used. Otherwise, the first ServicePort is used.
Checklist
- [x] Added tests that cover your change (if possible)
- [ ] Added/modified documentation as required (such as the
README.md, or thedocsdirectory) - [x] Manually tested
- [x] Made sure the title of the PR is a good description that can go into the release notes
BONUS POINTS checklist: complete for good vibes and maybe prizes?! :exploding_head:
- [x] Backfilled missing tests for code in same general area :tada:
- [x] Refactored something and made the world a better place :star2:
Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).
:memo: Please follow instructions at https://git.k8s.io/community/CLA.md#the-contributor-license-agreement to sign the CLA.
It may take a couple minutes for the CLA signature to be fully registered; after that, please reply here with a new comment and we'll verify. Thanks.
- If you've already signed a CLA, it's possible we don't have your GitHub username or you're using a different email address. Check your existing CLA data and verify that your email is set on your git commits.
- If you signed the CLA as a corporation, please sign in with your organization's credentials at https://identity.linuxfoundation.org/projects/cncf to be authorized.
- If you have done the above and are still having issues with the CLA being reported as unsigned, please log a ticket with the Linux Foundation Helpdesk: https://support.linuxfoundation.org/
- Should you encounter any issues with the Linux Foundation Helpdesk, send a message to the backup e-mail support address at: [email protected]
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.
Welcome @amorey!
It looks like this is your first PR to kubernetes-sigs/aws-load-balancer-controller 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.
You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.
You can also check if kubernetes-sigs/aws-load-balancer-controller has its own contribution guidelines.
You may want to refer to our testing guide if you run into trouble with your tests not passing.
If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!
Thank you, and welcome to Kubernetes. :smiley:
Hi @amorey. Thanks for your PR.
I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.
Once the patch is verified, the new status will be reflected by the ok-to-test label.
I understand the commands that are listed here.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
[APPROVALNOTIFIER] This PR is NOT APPROVED
This pull-request has been approved by: amorey
To complete the pull request process, please assign kishorj after the PR has been reviewed.
You can assign the PR to them by writing /assign @kishorj in a comment when ready.
The full list of commands accepted by this bot can be found here.
Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment
@amorey Thanks for taking this on! I'm going to try to test it out 🎉
In the meantime any chance you could sign the CLA? 🙏🏼
@Yasumoto Just signed the CLA. Let me know if you have any feedback!
Hm, it looks like the NLB was created successfully, but I think I'm running into https://github.com/aws/containers-roadmap/issues/512
I'm getting this error (slightly edited for brevity) when listening on port 514 on TCP+UDP (for syslog)
Error: UPGRADE FAILED: failed to create resource: Service "vector01" is invalid: spec.ports: Invalid value: []core.ServicePort{Name:"syslog-tcp", Protocol:"TCP", AppProtocol:(*string)(nil), Port:514, TargetPort:intstr.IntOrString{Type:0, IntVal:514, StrVal:""}, NodePort:0}, core.ServicePort{Name:"syslog-udp", Protocol:"UDP", AppProtocol:(*string)(nil), Port:514, TargetPort:intstr.IntOrString{Type:0, IntVal:514, StrVal:""}, NodePort:0}}: may not contain more than 1 protocol when type is 'LoadBalancer'
Which I think is from a gated validation check. I'm running on EKS, so not entirely sure if I can tweak the right knobs to turn that on, so suggestions appreciated! 🙇🏼
[edit]
And to be clear, this does look like it worked correctly! Curious how the Service object made it far enough to get processed by the Controller tho despite the validation failing, since I don't see a corresponding Service object in the cluster 🤔
❯ aws elbv2 --region=us-west-2 describe-listeners --load-balancer-arn arn:aws:elasticloadbalancing:us-west-2:891116894530:loadbalancer/net/vector01/e5af38562e88392a | jq .Listeners[1]
{
"ListenerArn": "arn:aws:elasticloadbalancing:us-west-2:891116894530:listener/net/vector01/e5af38562e88392a/2be85133dc56b9c2",
"LoadBalancerArn": "arn:aws:elasticloadbalancing:us-west-2:891116894530:loadbalancer/net/vector01/e5af38562e88392a",
"Port": 514,
"Protocol": "TCP_UDP",
"DefaultActions": [
{
"Type": "forward",
"TargetGroupArn": "arn:aws:elasticloadbalancing:us-west-2:891116894530:targetgroup/k8s-sw56prod-vector01-3f6a6cb913/7420c8dac5b4f50c",
"Order": 1,
"ForwardConfig": {
"TargetGroups": [
{
"TargetGroupArn": "arn:aws:elasticloadbalancing:us-west-2:891116894530:targetgroup/k8s-sw56prod-vector01-3f6a6cb913/7420c8dac5b4f50c"
}
]
}
}
]
}
Yep, looks like it's very new, and disabled by default in upstream 1.20/1.21. Just to confirm I'm trying to see what my EKS apiserver was started with, but seems reasonable this wouldn't be enabled until after this lands.
Ahh sorry, I forgot to mention that you have to enable the MixedProtocolLBService feature gate in order for everything to work. I'm using a self-managed cluster set up with kops and I have it enabled there but I haven't tried with EKS and don't know how to enable feature flags there.
What's the next step?
The failure log came from an update. I assume you added the port during the update operation?
I'm wondering if the AWS Load Balancer Controller somehow saw the updated structure before the update passed validation, acted on it, and did not see the rollback, so did not remove/revert the NLB.
If the feature gate is disabled, we should never see such TCP_UDP NLBs, but unless something else happened, it looks like an underlying issue (not related to this code specifically) if AWS LB Controller sees and acts on updates that won't or didn't pass validation, and hence were never persisted.
A similar validation path is used for example if you remove all the ports from a Service, so it might be useful to confirm that if you do that in an update, AWS LB Controller sees that change, in which case it's a lower-level issue which should be pursued separately.
Apart from that weirdness, I'd say this PR needs to be merged before EKS enables that feature gate (if they agree to do so while it's Alpha), as the feature gate's job is to reject such Services before the supporting infrastructure is ready. It won't go Beta (i.e. enabled by default) until
- All of the major clouds support this or indicate non-support properly
- Kube-proxy does not proxy on ports that are in an error state
and without this PR, I don't see any code here (or in the Cloud Provider AWS version) that would reject a mixed-protocol LB, instead I think it would end up generating a bad request for an LB with both TCP and UDP listeners on the same port, which would be "improper" non-support.
That suggests the other thing that should be done before that feature gate goes Beta is adding code to handle and reject such Services to the Cloud Provider AWS implementation of LB support, ideally directing people to use this Controller instead for that and other neat features.
The committers are authorized under a signed CLA.
- :white_check_mark: M00nF1sh (6669aa8237e5e1e19f73071257f3dd9d78f10d7b)
- :white_check_mark: Andres Morey (c906136b1795649389266b6c257fc9d7b9e4049f)
Also, quick note, it looks like your rebase has ended up with an empty-commit duplicate of 9b3880fffc2680460436ff4bd462640a7bdc03b1 (see f78cf1e57e09a1965511a6ae0ab4c90c82f132ad)
@amorey: PR needs rebase.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale - Mark this issue or PR as rotten with
/lifecycle rotten - Close this issue or PR with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle rotten - Close this issue or PR with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
/remove-lifecycle rotten
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale - Mark this issue or PR as rotten with
/lifecycle rotten - Close this issue or PR with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale - Mark this issue or PR as rotten with
/lifecycle rotten - Close this issue or PR with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.
This bot triages PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the PR is closed
You can:
- Mark this PR as fresh with
/remove-lifecycle stale - Close this PR with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
Some public attention to TCP_UDP support was directed to #1608 instead of here. This feature is still important, especially for the SIP protocol. It is helpful if SIP/UDP can switch to SIP/TCP when a message exceeds MTU, at the same load balancer IP address.
+1 for this feature. Some info for our use case:
- Our product is using Serf cluster and gossip protocol, which needs to expose both TCP and UDP on same port, default 7946.
- I spent a lot of effect to upgrade all our k8s clusters to get the mixed tcp and udp feature on LB type Service, then I realized that only allows you to mix TCP and UDP in same Service, but must be on different ports.
- AWS Network Loadbalancer supports the
TCP_UDPprotocal type, but K8s ServicePort spec doesn't allow it yet.
Before this feature is landing, does any one find an easy workaround, please ?
As a workaround, you might be able to use an externally-managed TCP_UDP LB and bind it to a non-LB Service, see https://kubernetes-sigs.github.io/aws-load-balancer-controller/v2.4/guide/use_cases/self_managed_lb/ and https://kubernetes-sigs.github.io/aws-load-balancer-controller/v2.4/guide/targetgroupbinding/targetgroupbinding/.
The same suggestion was made in https://github.com/kubernetes-sigs/aws-load-balancer-controller/issues/2759#issuecomment-1212366785 by someone who knows more about the AWS LB Controller, so it should work.
However, based on this PR's implementation, you might need to explicitly create two NetworkingPort in your TargetGroupBinding since you need one for TCP and one for UDP.
I think in terms of this PR, having NetworkingPort support TCP_UDP (and being the default) would be better since it would make the obvious case work by default, but that's probably a separate discussion (and either a separate PR, or a replacement/rework of this PR).
Any progress?
The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.
This bot triages PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the PR is closed
You can:
- Mark this PR as fresh with
/remove-lifecycle stale - Close this PR with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
any progress?
@amorey Any progress on this feature? I would believe many people were waiting for this feature release.