cloud-provider-kubevirt icon indicating copy to clipboard operation
cloud-provider-kubevirt copied to clipboard

lb: endpointslice controller for externalTrafficPolicyLocal

Open BartVerc opened this issue 11 months ago • 9 comments

What this PR does / why we need it: ExternalTrafficPolicy Local requires traffic to route to specific endpoint nodes instead of routing traffic to all existing nodes. The EndpointSlice Controller looks at the Tenant cluster and updates Endpointslices in the infra cluster accordingly.

Which issue(s) this PR fixes: Fixes #41

Release note:

Added enableEPSController load balancer config flag, to support ExternalTrafficPolicy Local.

BartVerc avatar Mar 12 '24 12:03 BartVerc

Hi @BartVerc. Thanks for your PR.

PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test all.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

kubevirt-bot avatar Mar 12 '24 12:03 kubevirt-bot

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Once this PR has been reviewed and has the lgtm label, please assign qinqon for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

kubevirt-bot avatar Mar 12 '24 12:03 kubevirt-bot

@BartVerc can you split the PR into a pair of commits vendoring and the rest ?

qinqon avatar Mar 12 '24 15:03 qinqon

The issue here is that the NodeName is an optional field. Is it possible to encounter an endpointslice without a NodeName? For example, a selectorless service with custom endpointslices?

That is a valid concern, however, a selectorless LoadBalancer service could point to any endpoint, also outside the cluster. If it's inside the cluster it is up to the user to add the NodeName in my opinion. With endpoints created via a Service with selector, the kubernetes endpointsliceController tries to find the NodeName via the pod pod.Spec.NodeName, which is set when it's running. Also, when a tenant wants to create a selectorless LoadBalancer service, it can always decide to set the ExternalTrafficPolicy to Cluster instead of Local.

So I don't think an endpointslice without a NodeName is a concern.

BartVerc avatar Mar 26 '24 13:03 BartVerc

That is a valid concern, however, a selectorless LoadBalancer service could point to any endpoint, also outside the cluster. If it's inside the cluster it is up to the user to add the NodeName in my opinion. With endpoints created via a Service with selector, the kubernetes endpointsliceController tries to find the NodeName via the pod pod.Spec.NodeName, which is set when it's running. Also, when a tenant wants to create a selectorless LoadBalancer service, it can always decide to set the ExternalTrafficPolicy to Cluster instead of Local.

Yeah, I think I get what you're saying.

NodeName is optional, but not setting NodeName on an endpointslice when externalTrafficPolicy is set to Local wouldn't make a lot of sense, so it doesn't seem to impact your logic in practice.

davidvossel avatar Mar 29 '24 16:03 davidvossel

Hey @BartVerc, you can reduce the code a lot if you use controller-runtime lib for this, since we are not really using anything specific from cloud-provider infra, what do you think ?

qinqon avatar Apr 04 '24 08:04 qinqon

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

/lifecycle stale

kubevirt-bot avatar Jul 09 '24 21:07 kubevirt-bot

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

/lifecycle rotten

kubevirt-bot avatar Aug 08 '24 22:08 kubevirt-bot

quick update: I found some time to work on this again, but I am having some trouble rebasing it on the main branch. I started a new branch from main which seems to work better. Not sure if this will end up correctly in this PR, so maybe I need to open a new one.

BartVerc avatar Sep 06 '24 10:09 BartVerc

@BartVerc hi, I refactored your pr and prepared https://github.com/kubevirt/cloud-provider-kubevirt/pull/330 and https://github.com/kubevirt/cloud-provider-kubevirt/pull/331

kvaps avatar Sep 26 '24 14:09 kvaps