grpc-go icon indicating copy to clipboard operation
grpc-go copied to clipboard

rls: Update logic in the control channel connectivity state monitoring goroutine

Open easwars opened this issue 2 months ago • 1 comments

Currently this is what the RLS LB policy is currently doing:

  • Create an RLS control channel: https://github.com/grpc/grpc-go/blob/7472d578b15f718cbe8ca0f5f5a3713093c47b03/balancer/rls/balancer.go#L362
  • Start a goroutine to monitor the connectivity state of the channel: https://github.com/grpc/grpc-go/blob/7472d578b15f718cbe8ca0f5f5a3713093c47b03/balancer/rls/control_channel.go#L95
  • The goroutine waits for the channel to become READY: https://github.com/grpc/grpc-go/blob/7472d578b15f718cbe8ca0f5f5a3713093c47b03/balancer/rls/control_channel.go#L188
  • The next time it becomes READY again, it reset backoffs: https://github.com/grpc/grpc-go/blob/7472d578b15f718cbe8ca0f5f5a3713093c47b03/balancer/rls/control_channel.go#L202

This was a wrong assumption that once we are READY and get back to READY, we should have gone through TRANSIENT_FAILURE.

What should it should actually do:

  • When the state transitions to TRANSIENT_FAILURE, record that transition
  • The next time it transitions to READY, reset the backoff timeouts in all cache entries. Specifically, this means that it will reset the backoff state and cancel the pending backoff timer.

We should also update this test: https://github.com/grpc/grpc-go/blob/7472d578b15f718cbe8ca0f5f5a3713093c47b03/balancer/rls/balancer_test.go#L916 to ensure that a transition from READY to IDLE to READY does not result in backoff timeouts being reset.

easwars avatar Nov 04 '25 21:11 easwars

Hello @easwars, i can work on this, could you assing it to me?

ulascansenturk avatar Nov 19 '25 21:11 ulascansenturk