envoy icon indicating copy to clipboard operation
envoy copied to clipboard

Make `ClientSideWeightedRoundRobin` lb policy optionally respect locality weights specified by EDS.

Open efimki opened this issue 6 months ago • 9 comments

  • Add a locality_weighted_lb_config knob to ClientSideWeightedRoundRobin config proto.
  • If locality_weighted_lb_config is set, then ClientSideWeightedRoundRobin lb policy will respect locality weights.
  • Add an integration test that demonstrates locality weights being used or ignored depending on this knob.

Testing: bazel test //test/extensions/load_balancing_policies/client_side_weighted_round_robin:integration_test

#39362

efimki avatar May 06 '25 20:05 efimki

CC @envoyproxy/api-shepherds: Your approval is needed for changes made to (api/envoy/|docs/root/api-docs/). envoyproxy/api-shepherds assignee is @abeyad CC @envoyproxy/api-watchers: FYI only for changes made to (api/envoy/|docs/root/api-docs/).

:cat:

Caused by: https://github.com/envoyproxy/envoy/pull/39364 was synchronize by efimki.

see: more, trace.

As discussed offline, should there be a config knob that determines the value here?

Yes, I've added a config knob. PTAL when you have a chance.

efimki avatar May 13 '25 15:05 efimki

Does this mean you no longer need this in the test?

https://github.com/envoyproxy/envoy/blob/main/test/extensions/load_balancing_policies/client_side_weighted_round_robin/client_side_weighted_round_robin_lb_test.cc#L97-L99

Yes, this code was never needed. Removed.

efimki avatar May 13 '25 15:05 efimki

I don't think this is the right approach. The ClientSideWeightedRoundRobin LB policy was designed to do endpoint picking only, not locality picking. gRPC's implementation of that policy does and will not support locality picking, so adding this functionality to this policy means that we'll have a significant divergence between gRPC and Envoy, which we don't want.

If Envoy needs to support locality picking with ClientSideWeightedRoundRobin, then it should do it by implementing the WrrLocality parent policy, as we've discussed before.

markdroth avatar May 13 '25 15:05 markdroth

I don't think this is the right approach. The ClientSideWeightedRoundRobin LB policy was designed to do endpoint picking only, not locality picking. gRPC's implementation of that policy does and will not support locality picking, so adding this functionality to this policy means that we'll have a significant divergence between gRPC and Envoy, which we don't want.

@adisuissa has pointed out that some Envoy LB policies (at least MAGLEV and RING_HASH) explicitly expose locality_weighted_lb_config and this approach matches them:

api/envoy/extensions/load_balancing_policies/maglev/v3/maglev.proto:  common.v3.LocalityLbConfig.LocalityWeightedLbConfig locality_weighted_lb_config = 3;
api/envoy/extensions/load_balancing_policies/ring_hash/v3/ring_hash.proto:  common.v3.LocalityLbConfig.LocalityWeightedLbConfig locality_weighted_lb_config = 7;

If Envoy needs to support locality picking with ClientSideWeightedRoundRobin, then it should do it by implementing the WrrLocality parent policy, as we've discussed before.

Yes, I think in the longer term we could/should implement WrrLocality parent policy in Envoy. This PR does NOT prevent that as by default (without locality_weighted_lb_config) locality weights will be ignored by ClientSideWeightedRoundRobin.

efimki avatar May 13 '25 15:05 efimki

@adisuissa has pointed out that some Envoy LB policies (at least MAGLEV and RING_HASH) explicitly expose locality_weighted_lb_config and this approach matches them:

It makes sense for policies like ring_hash and maglev to support locality picking, because those are affinity-based policies that do locality picking and endpoint picking as a single step. gRPC supports ring_hash, and our ring_hash policy does support locality picking, because it makes sense there.

WRR is not an affinity policy, and I don't think it makes sense to have it do locality picking and endpoint picking in the same layer.

Yes, I think in the longer term we could/should implement WrrLocality parent policy in Envoy. This PR does NOT prevent that as by default (without locality_weighted_lb_config) locality weights will be ignored by ClientSideWeightedRoundRobin.

The fact that you say "could" as one of the options there implies that this is only a possible future change, not something that we are committing to. This is one of my main objections to this PR: once we do this short-term hack, no one is going to be motivated to pursue the thing that I think we've already agreed is the right long-term solution. And even if we do eventually do it, we will be stuck supporting the knob added in this PR indefinitely. I don't think it makes sense to add this tech debt that we can never get rid of when we agree that it's not the solution we actually want.

Also, this interim solution means that any control plane that needs to support gRPC and Envoy will need to send different configuration to the two data planes. I think this is a fairly fundamental problem, and we need to fix it instead of making it worse.

markdroth avatar May 13 '25 16:05 markdroth

Thanks for your inputs!

Also, this interim solution means that any control plane that needs to support gRPC and Envoy will need to send different configuration to the two data planes. I think this is a fairly fundamental problem, and we need to fix it instead of making it worse.

I definitely didn't mean to make it worse. I'm fine with scrapping the interim solution and doing the long term solution instead.

efimki avatar May 14 '25 14:05 efimki

Sweet, thanks.

/wait

tonya11en avatar May 14 '25 16:05 tonya11en

This pull request has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

github-actions[bot] avatar Jun 13 '25 20:06 github-actions[bot]

This pull request has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

github-actions[bot] avatar Jul 14 '25 08:07 github-actions[bot]

This pull request has been automatically closed because it has not had activity in the last 37 days. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

github-actions[bot] avatar Jul 21 '25 08:07 github-actions[bot]