grpc-go pickfirst: improve behavior when the first address is a blackhole

With the current implementation, if the pick_first LB policy is given a set of addresses where the first address is a blackhole i.e the connection just hangs and does not fail before the connection timeout expires, the LB policy employs an exponential backoff mechanism and retries the connection attempt. But when it retries, it again starts from the first address. This means that when the first address is a blackhole, the other addresses never get a chance.

To improve the behavior in this scenario, we could try doing one of the following:

when retrying, the pick_first LB policy could start from where it left off in the address list, instead of starting from the top
instead of creating one SubConn with all addresses in it and letting the channel implement the logic of using the first good address, change the pick_first LB policy to create one SubConn for each address and apply the connection timeout and backoff for each SubConn separately.

Option #2 is more work, but might be the better way to go.

Oct 07 '22 20:10 easwars

Dumping some state here based on a discussion with @dfawley

We could move the pickfirst behavior out of the channel into the pick_first LB policy. This might be better for the long run. The only other user of multiple addresses per subConn is grpclb and the last time Doug looked into making pick_first a child of grpclb, it turned out to be not so trivial.
Health checking is now tightly coupled with the channel. For dual-stack support, we are going to need health checking support in the pick_first LB policy. It might be a good idea to see if we can decouple health checking and the channel as part of this.
We need to ensure that backoff is done at an address level and not at the "set of addresses" level (where it is currently).

Also, given that we are going to be adding some configuration knob to the pick_first LB policy as part of affinity support for DP, it might be worth taking care of some of these items as part of that.

Mar 29 '23 21:03 easwars

Just chiming in here to say we're seeing this exact issue in Vault where if a node in the cluster is deleted and we continue to reference the server in our pool of servers, this will block until it hits the deadline and errors out.

Apr 26 '23 17:04 jasonodonnell

We are in the process of making some significant changes to the pick_first LB policy implementation. We have a couple of proposals out, that talk about the upcoming changes. See A61 and A62. They aren't approved it, so there could be some changes.

Apr 26 '23 17:04 easwars

cc: @easwars Can this be closed since https://github.com/grpc/proposal/pull/356 and https://github.com/grpc/proposal/pull/357 are getting merged, and we are reimplementing pickfirst all together?

May 28 '24 20:05 arvindbr8

Since our team is prioritizing DualStack feature development in the coming quarters, Let's close this.

May 29 '24 00:05 arvindbr8

grpc-go grpc-go copied to clipboard

pickfirst: improve behavior when the first address is a blackhole

grpc-go
grpc-go copied to clipboard