ringhash: fix normalizeWeights
RELEASE NOTES:
- ringhash: fix normalizeWeights in the event of just a weight update
Thanks for the review @atollena and @aranjans.
I still think this diff is not ready to be merged yet. There is still the open question why getWeightAttribute(a) doesnt work in this case, even though (*subConn).weight is set using getWeightAttribute(a). I wanted to check with @zasweq if he knows what's going on.
There's two other maintainers actively looking at this so I'll remove myself.
Thanks for the review @atollena and @aranjans.
I still think this diff is not ready to be merged yet. There is still the open question why getWeightAttribute(a) doesnt work in this case, even though (*subConn).weight is set using getWeightAttribute(a). I wanted to check with @zasweq if he knows what's going on.
Ha. I thought that you had that figured out. I took a look at the code and, with my limited understanding, it looks like the "bug" is in resolver.AddressMap.Set: https://github.com/grpc/grpc-go/blob/master/resolver/map.go#L81-L90
At L86, when we found that we already had the address in the map, the code only updates the value, not the addr. So if the BalancerAttribute changed, the addr isn't updated. As a result, if the weight changes, which is a BalancerAttribute, then the resolver map does not update the entry with the new address.
From the documentation of resolver.AddressMap, it sounds like resolver.BalancerAttribute should always be ignored, and two addresses with all fields equal except for resolver.BalancerAttribute and Metadata have the same key. But then, it's not super clear whether calling Set twice with two addresses having varying resolver.BalancerAttribute should update the key.
I see two options to fix this:
- Make
resolver.AddressMapupdateMetadataandBalancerAttributewhen setting the value associated with an address. - Make
resolver.AddressMapalways clearMetadataandBalancerAttributebefore storing the address entry. I think it would be a bit less confusing, and make it clear thatMetadataandBalancerAttributeshould be stored separately when using an address map, e.g. in the value. This is a breaking change though, sinceresolver.AddressMapis public (it might not hurt to mark it experimental -- FWIW we have 1 usage of it internally at datadog, for a custom balancer which is a variation of weighted round robin).
I'd sollicit Doug and Zach's input on this because I don't have the context of when address map is useful (looks like it's mostly to find existing subconn associated with an address in O(1)?), but option 1 seems more realistic.
If we are open to making a breaking change, another option to consider, in addition to those listed in https://github.com/grpc/grpc-go/pull/7156#issuecomment-2071761929 :
- Introduce a new struct type to serve as the key for
AddressMap.m, call itMapKey.MapKeywill contain only theAddrandServerNamestrings as members. Change theAddressMap.Keys()method to return objects ofMapKey. This way the compiler ensures that callers access only those fields of the key that are set.
We could also achieve the same by first deprecating the existing Keys method and introducing a new method with the behaviour described above.
From the documentation of resolver.AddressMap, it sounds like resolver.BalancerAttribute should always be ignored, and two addresses with all fields equal except for resolver.BalancerAttribute and Metadata have the same key. But then, it's not super clear whether calling Set twice with two addresses having varying resolver.BalancerAttribute should update the key.
what the docstring on resolver.AddressMap really talks about is that we should not be relying on the balancerAttributes in the key, but use value instead. And in ringHash, the value is of type ringhash.subConn which actually contains the weight information.
The actual bug is with using the rignhash.getWeightAttribute helper. This is helpful when getting the balancerattribute weight when updating addresses. But is incorrect when used to get weights from the AddressMap. Instead, we should type assert to ringhash.subConn and read weights from that instead.
cc'ing: @easwars; since he has most context on ringhash.
Testing / tests (tests, 1.22, 386) (pull_request)
Something is wrong with this presubmit check. Will need to investigate.
Merging since this change is not expected to break 386 variant of go1.22 test