cluster-api-provider-aws
cluster-api-provider-aws copied to clipboard
NLB support
/kind feature
Describe the solution you'd like [A clear and concise description of what you want to happen.]
As an engineer looking to deploy Kubernetes with CAPA I would like my control plane to be fronted by a Network Load Balancer instead of Classic ELBs So that I can continue to use CAPA after AWS deprecates them
Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.]
Classic ELBs are getting deprecated in August 2022, and AWS SDK for Go supports NLBs. We should use them!
More info here
@carlosonunez: This issue is currently awaiting triage.
If CAPA/CAPI contributors determines this is a relevant issue, they will accept it by applying the triage/accepted
label and provide further guidance.
The triage/accepted
label can be added by org members by writing /triage accepted
in a comment.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
Thanks for bringing this up @carlosonunez I think we are tracking this somewhere else, will find out and link these.
There is an ongoing test to see how to deal with hairpinning issue comes with NLBs (when LB directs Kubelet traffic to the api-server on the same node).
That's great that this is being tracked elsewhere! I couldn't find any issues pertaining to it and thought "That can't be right!" :)
Any update on whether this truly is being tracked? thanks
I haven't seen issue tracking NLB support, but we considered using NLB back in 2018: https://github.com/kubernetes-sigs/cluster-api-provider-aws/pull/115#discussion_r220562864
There is an ongoing test to see how to deal with hairpinning issue comes with NLBs (when LB directs Kubelet traffic to the api-server on the same node).
For reference, CAPZ also has a hairpinning issue, although specifically when using an internal load balancer: https://github.com/kubernetes-sigs/cluster-api-provider-azure/pull/1033#issuecomment-1007789608. The current workaround is to prevent requests from kubelet on the control plane machines from going through the LB, by resolving NLB DNS record to 127.0.0.1.
I could not find a specific issue for this, I see this was discussed in CAPA v1alpha4 API changes document but during today's office hours, we triaged this issue and keeping this issue.
For reference, CAPZ also has a hairpinning issue, although specifically when using an internal load balancer: kubernetes-sigs/cluster-api-provider-azure#1033 (comment). The current workaround is to prevent requests from kubelet on the control plane machines from going through the LB, by resolving NLB DNS record to 127.0.0.1.
Thanks @dlipovetsky, good to know there is an example workaround. If I can't make it work by disabling client IP preservation, this might be the way to go.
As suggested by @sedefsavas, changing the target group type from instance to IP, can also workaround this issue https://aws.amazon.com/premiumsupport/knowledge-center/target-connection-fails-load-balancer/ But the downside of this is the IP of instance might change during lifecycle, the corresponding target group needs to be updated, also the NLB's listeners needs to be updated as well. This could cause downtime.
Update With instance type target, I can still curl the NLB endpoint within control plane, with appropriate SecurityGroup
Just a correction on retirement of Classic load balancers mentioned in the issue:
EC2 Classic Networking is being retired not the Classic Load Balancers. Classic Load Balancers that are being used in EC2 Classic networking will need to be migrated to an LB in a VPC. If Classic Load Balancers are already in a VPC, they won't be affected.
TL;DR August 2022 retirement date is not for Classic ELBs.
UPDATE: Never mind. This deprecation only applies to ELBs in EC2-Classic, as @sedefsavas said. It's a bit confusing since this article recommends migrating to ELBv2 resources.
The language is SUPER confusing, but according to this AWS article, ELBs (aka Classic LBs) are in scope for retirement as well: https://docs.aws.amazon.com/elasticloadbalancing/latest/userguide/migrate-classic-load-balancer.html#migrate-step-by-step-classiclink
On Feb 17, 2022, 01:59 -0600, sedefsavas @.***>, wrote:
Just a correction on retirement of Classic load balancers mentioned in the issue: EC2 Classic Networking is being retired not the Classic Load Balancers. Classic Load Balancers that are being used in EC2 Classic networking will need to be migrated to an LB in a VPC. If Classic Load Balancers are already in a VPC, they won't be affected. TL;DR August 2022 retirement date is not for Classic ELBs. — Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you were mentioned.Message ID: @.***>
This feature(NLB for control-plane) is very much required in the market irrespective of the CLB still being there. Please let me know if there is a need for a contribution to this.
Are we planning to convert the existing Classic ELBs to NLBs?
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale
- Mark this issue or PR as rotten with
/lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
While implementing this, we can avoid introducing the problem of creating LBs with identical names for clusters created with same name in different namespaces: https://github.com/kubernetes-sigs/cluster-api-provider-aws/issues/969
NLB names need to be unique in the same AWS account and region.
/assign
Hello all.
So I read through this issue finally. :)
What is the consensus here? Do we just disable IP address preservation, or do we do a similar workaround as Azure did?
@sedefsavas do you have anything on this by any chance? It's okay if not. I'm just gathering as much information as I can. :)
This comment:
But the downside of this is the IP of instance might change during lifecycle, the corresponding target group needs to be updated, also the NLB's listeners needs to be updated as well. This could cause downtime.
I'm not sure how much of a downside this is, are you implying that when an EC2 instance gets rebooted it might get assigned a new IP Address? That seems like an edge case that might be worth further analysis. However, when new machines are rolled out they need to be added to the target groups anyway -- regardless of ELB or NLB -- and so I don't consider that scenario a "new problem" with NLB specifically.
If the target type for each target group in the NLB is by instance-id, then this seems like a non-issue?
by instance-id,
It looks like Amazon has provided better clarity on this, and if you turn off Preserve client IP addresses
you can have loopback without using IP Addresses explicitly, this however would mean that the API server would always see the NLB's IP Address even for external users, which does not sound good from an audit standpoint.
Sources: https://docs.aws.amazon.com/elasticloadbalancing/latest/network/load-balancer-troubleshooting.html Also: https://aws.amazon.com/premiumsupport/knowledge-center/target-connection-fails-load-balancer/
Yeah, we probably will have to do something like what CAPZ did.
Okay so last calls. We are going to do something like CAPZ in which they edited the hosts file and added references to the lbs ip as localhost. All in favour say yay all against say nay.
yayOn Oct 24, 2022, at 23:49, Gergely Brautigam @.***> wrote: Okay so last calls. We are going to do something like CAPZ in which they edited the hosts file and added references to the lbs ip as localhost. All in favour say yay all against say nay.
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>
Okay so last calls. We are going to do something like CAPZ in which they edited the hosts file and added references to the lbs ip as localhost. All in favour say yay all against say nay.
Just a naive question. Does this affect the kubelet certificate in any way? E.g. that it also has to contain 127.0.0.1 instead only the node IP? Futhermore, would it make sense to make this behavior (map the LB IPs to localhost in /etc/hosts) configurable, as there might be cases where the hairpinning issue is not present and it's desirable that the traffic goes to the API LB instead of the single API server running on the local node.
Just a naive question. Does this affect the kubelet certificate in any way? E.g. that it also has to contain 127.0.0.1 instead only the node IP?
Good question. I don't know. :) I will have to consult some documentation. I would assume so; otherwise, the call to localhost might fail with a certificate issue. But also, the call to localhost does not usually require a certificate.
Futhermore, would it make sense to make this behavior (map the LB IPs to localhost in /etc/hosts) configurable, as there might be cases where the hairpinning issue is not present and it's desirable that the traffic goes to the API LB instead of the single API server running on the local node.
Certainly. :) That was my intention.
Yeah, since we will add 127.0.0.1 which is the loopback address, I don't think that is required to be added to the certificate. But let's make sure. :)
I started to flash this out. I'm going to try and keep it super simple at first. Let's see how that goes.
I started with the basics of just creating the NLB. There are still a couple todos left on it.