grpc-go
grpc-go copied to clipboard
clusterimpl: update picker synchronously upon receipt of configuration update
https://github.com/grpc/grpc-go/issues/5469 recommends an audit of existing LB policies to ensure that they update their pickers synchronously upon receipt of a configuration update.
xds_cluster_impl_experimental
does not update its picker synchronously.
- Configuration updates are handled here: https://github.com/grpc/grpc-go/blob/0561c78c9dcb5876a140ff7be97f0f37abc36ec8/xds/internal/balancer/clusterimpl/clusterimpl.go#L208
- In the above function, the LB policy
- initially checks for some error conditions
- applies new load reporting configuration, if any
- switches to a new child policy, if the child policy name has changed
- pushes the newly received load balancer configuration onto a channel here: https://github.com/grpc/grpc-go/blob/0561c78c9dcb5876a140ff7be97f0f37abc36ec8/xds/internal/balancer/clusterimpl/clusterimpl.go#L253
- and returns from
UpdateClientConnState
- The newly received configuration update is processed here asynchronously: https://github.com/grpc/grpc-go/blob/0561c78c9dcb5876a140ff7be97f0f37abc36ec8/xds/internal/balancer/clusterimpl/clusterimpl.go#L475
- And a new picker is pushed upwards here: https://github.com/grpc/grpc-go/blob/0561c78c9dcb5876a140ff7be97f0f37abc36ec8/xds/internal/balancer/clusterimpl/clusterimpl.go#L479
This needs to be changed to return a picker synchronously upon receipt of a configuration update.
@easwars I'd be working on this issue.
After some thought, I've come to the understanding that I don't necessarily like the approach taken in the PR linked above because it still causes multiple picker updates and testing is very tied to the implementation rather than behavior. Instead I suggest we take the following approach. We can do this step-by-step:
- Stop using deprecated APIs in the
gracefulswitch.Balancer
- Currently
clusterimpl
has a single child and uses agracefulswitch.Balancer
to support the case where the child policy name changes. And this is currently handled by calling theSwitchTo
method on thegracefulswitch.Balancer
. Instead we should haveclusterimpl
build JSON config in the format expected by thegracefulswitch.Balancer
and callUpdateClientConnState
instead. - Add an e2e style test for this as part of this change
- As part of looking into these changes, I realized that we have no test for this scenario.
Currently
clusterimpl
- As part of looking into these changes, I realized that we have no test for this scenario.
Currently
- Currently
- Use a
grpcsync.CallbackSerializer
instead of arun
goroutine to synchronize updates from the channel and updates from the child policy.- This is greatly simplify the implementation since we will not longer require a mutex, or a closed event, or a done event etc.
- I feel that testing coverage is quite thin for the
clusterimpl
LB policy as a whole.- We should audit C-core and/or Java and see what tests they have and add them here. Mostly e2e style tests.
- Once we have the above changes, updating the picker synchronously would be simple:
- As part of handling a config update from the channel, the
clusterimpl
would do the following:- inhibit picker updates from the child policy
- process the config update from the channel
- resume picker updates, and as part of this send a new picker to the channel
- At the end of handing a config update, whether or not an update from a child policy was received, the
clusterimpl
LB policy should send a single picker update to the channel.
- As part of handling a config update from the channel, the