cockroach icon indicating copy to clipboard operation
cockroach copied to clipboard

kv: ensure secondary tenants route follower reads to the closest replica

Open arulajmani opened this issue 2 years ago • 2 comments

The dist sender uses node locality information to rank replicas of a range by latency. Previously, this node locality information was read off a node descriptor available in Gossip. Unfortunately, secondary tenants do not have access to Gossip, and as such, would end up randomizing this list of replicas. This manifested itself through unpredictable latencies when running follower reads.

We're no longer susceptible to this hazard with this patch. This is done by eschewing the need of a node descriptor from gossip in the DistSender; instead, we now instantiate the DistSender with locality information.

However, we do still use Gossip to get the current node's ID when ranking replicas. This is done to ascertain if there is a local replica, and if there is, to always route to it. Unfortunately, because secondary tenants don't have access to Gossip, they can't conform to these semantics. They're susceptible to a hazard where a request may be routed to another replica in the same locality tier as the client even though the client has a local replica as well. This shouldn't be a concern in practice given the diversity heuristic.

Resolves https://github.com/cockroachdb/cockroach/issues/81000

Release note (bug fix): fix an issue where secondary tenants could route follower reads to a random, far away replica instead of one closer.

arulajmani avatar Aug 09 '22 21:08 arulajmani

This change is Reviewable

cockroach-teamcity avatar Aug 09 '22 21:08 cockroach-teamcity

@nvanbenschoten, I'm planning on tacking on a third commit that removes the use of getNodeDescriptor on here, but other than that this should be good for a look.

arulajmani avatar Aug 09 '22 21:08 arulajmani

TFTRs!

bors r+

arulajmani avatar Aug 14 '22 22:08 arulajmani

Build succeeded:

craig[bot] avatar Aug 14 '22 23:08 craig[bot]