grpc-go
grpc-go copied to clipboard
pickfirst: improve behavior when the first address is a blackhole
With the current implementation, if the pick_first
LB policy is given a set of addresses where the first address is a blackhole i.e the connection just hangs and does not fail before the connection timeout expires, the LB policy employs an exponential backoff mechanism and retries the connection attempt. But when it retries, it again starts from the first address. This means that when the first address is a blackhole, the other addresses never get a chance.
To improve the behavior in this scenario, we could try doing one of the following:
- when retrying, the
pick_first
LB policy could start from where it left off in the address list, instead of starting from the top - instead of creating one
SubConn
with all addresses in it and letting the channel implement the logic of using the first good address, change thepick_first
LB policy to create oneSubConn
for each address and apply the connection timeout and backoff for eachSubConn
separately.
Option #2 is more work, but might be the better way to go.
Dumping some state here based on a discussion with @dfawley
- We could move the pickfirst behavior out of the channel into the
pick_first
LB policy. This might be better for the long run. The only other user of multiple addresses per subConn isgrpclb
and the last time Doug looked into makingpick_first
a child ofgrpclb
, it turned out to be not so trivial. - Health checking is now tightly coupled with the channel. For dual-stack support, we are going to need health checking support in the
pick_first
LB policy. It might be a good idea to see if we can decouple health checking and the channel as part of this. - We need to ensure that backoff is done at an address level and not at the "set of addresses" level (where it is currently).
Also, given that we are going to be adding some configuration knob to the pick_first
LB policy as part of affinity support for DP, it might be worth taking care of some of these items as part of that.
Just chiming in here to say we're seeing this exact issue in Vault where if a node in the cluster is deleted and we continue to reference the server in our pool of servers, this will block until it hits the deadline and errors out.
We are in the process of making some significant changes to the pick_first
LB policy implementation. We have a couple of proposals out, that talk about the upcoming changes. See A61 and A62. They aren't approved it, so there could be some changes.
cc: @easwars Can this be closed since https://github.com/grpc/proposal/pull/356 and https://github.com/grpc/proposal/pull/357 are getting merged, and we are reimplementing pickfirst all together?
Since our team is prioritizing DualStack feature development in the coming quarters, Let's close this.