envoy
envoy copied to clipboard
Make `ClientSideWeightedRoundRobin` lb policy optionally respect locality weights specified by EDS.
- Add a
locality_weighted_lb_configknob toClientSideWeightedRoundRobinconfig proto. - If
locality_weighted_lb_configis set, thenClientSideWeightedRoundRobinlb policy will respect locality weights. - Add an integration test that demonstrates locality weights being used or ignored depending on this knob.
Testing: bazel test //test/extensions/load_balancing_policies/client_side_weighted_round_robin:integration_test
#39362
CC @envoyproxy/api-shepherds: Your approval is needed for changes made to (api/envoy/|docs/root/api-docs/).
envoyproxy/api-shepherds assignee is @abeyad
CC @envoyproxy/api-watchers: FYI only for changes made to (api/envoy/|docs/root/api-docs/).
As discussed offline, should there be a config knob that determines the value here?
Yes, I've added a config knob. PTAL when you have a chance.
Does this mean you no longer need this in the test?
https://github.com/envoyproxy/envoy/blob/main/test/extensions/load_balancing_policies/client_side_weighted_round_robin/client_side_weighted_round_robin_lb_test.cc#L97-L99
Yes, this code was never needed. Removed.
I don't think this is the right approach. The ClientSideWeightedRoundRobin LB policy was designed to do endpoint picking only, not locality picking. gRPC's implementation of that policy does and will not support locality picking, so adding this functionality to this policy means that we'll have a significant divergence between gRPC and Envoy, which we don't want.
If Envoy needs to support locality picking with ClientSideWeightedRoundRobin, then it should do it by implementing the WrrLocality parent policy, as we've discussed before.
I don't think this is the right approach. The ClientSideWeightedRoundRobin LB policy was designed to do endpoint picking only, not locality picking. gRPC's implementation of that policy does and will not support locality picking, so adding this functionality to this policy means that we'll have a significant divergence between gRPC and Envoy, which we don't want.
@adisuissa has pointed out that some Envoy LB policies (at least MAGLEV and RING_HASH) explicitly expose locality_weighted_lb_config and this approach matches them:
api/envoy/extensions/load_balancing_policies/maglev/v3/maglev.proto: common.v3.LocalityLbConfig.LocalityWeightedLbConfig locality_weighted_lb_config = 3;
api/envoy/extensions/load_balancing_policies/ring_hash/v3/ring_hash.proto: common.v3.LocalityLbConfig.LocalityWeightedLbConfig locality_weighted_lb_config = 7;
If Envoy needs to support locality picking with ClientSideWeightedRoundRobin, then it should do it by implementing the WrrLocality parent policy, as we've discussed before.
Yes, I think in the longer term we could/should implement WrrLocality parent policy in Envoy.
This PR does NOT prevent that as by default (without locality_weighted_lb_config) locality weights will be ignored by ClientSideWeightedRoundRobin.
@adisuissa has pointed out that some Envoy LB policies (at least
MAGLEVandRING_HASH) explicitly exposelocality_weighted_lb_configand this approach matches them:
It makes sense for policies like ring_hash and maglev to support locality picking, because those are affinity-based policies that do locality picking and endpoint picking as a single step. gRPC supports ring_hash, and our ring_hash policy does support locality picking, because it makes sense there.
WRR is not an affinity policy, and I don't think it makes sense to have it do locality picking and endpoint picking in the same layer.
Yes, I think in the longer term we could/should implement
WrrLocalityparent policy in Envoy. This PR does NOT prevent that as by default (withoutlocality_weighted_lb_config) locality weights will be ignored byClientSideWeightedRoundRobin.
The fact that you say "could" as one of the options there implies that this is only a possible future change, not something that we are committing to. This is one of my main objections to this PR: once we do this short-term hack, no one is going to be motivated to pursue the thing that I think we've already agreed is the right long-term solution. And even if we do eventually do it, we will be stuck supporting the knob added in this PR indefinitely. I don't think it makes sense to add this tech debt that we can never get rid of when we agree that it's not the solution we actually want.
Also, this interim solution means that any control plane that needs to support gRPC and Envoy will need to send different configuration to the two data planes. I think this is a fairly fundamental problem, and we need to fix it instead of making it worse.
Thanks for your inputs!
Also, this interim solution means that any control plane that needs to support gRPC and Envoy will need to send different configuration to the two data planes. I think this is a fairly fundamental problem, and we need to fix it instead of making it worse.
I definitely didn't mean to make it worse. I'm fine with scrapping the interim solution and doing the long term solution instead.
Sweet, thanks.
/wait
This pull request has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!
This pull request has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!
This pull request has been automatically closed because it has not had activity in the last 37 days. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!