cockroach
cockroach copied to clipboard
kv: ensure secondary tenants route follower reads to the closest replica
The dist sender uses node locality information to rank replicas of a range by latency. Previously, this node locality information was read off a node descriptor available in Gossip. Unfortunately, secondary tenants do not have access to Gossip, and as such, would end up randomizing this list of replicas. This manifested itself through unpredictable latencies when running follower reads.
We're no longer susceptible to this hazard with this patch. This is done by eschewing the need of a node descriptor from gossip in the DistSender; instead, we now instantiate the DistSender with locality information.
However, we do still use Gossip to get the current node's ID when ranking replicas. This is done to ascertain if there is a local replica, and if there is, to always route to it. Unfortunately, because secondary tenants don't have access to Gossip, they can't conform to these semantics. They're susceptible to a hazard where a request may be routed to another replica in the same locality tier as the client even though the client has a local replica as well. This shouldn't be a concern in practice given the diversity heuristic.
Resolves https://github.com/cockroachdb/cockroach/issues/81000
Release note (bug fix): fix an issue where secondary tenants could route follower reads to a random, far away replica instead of one closer.
@nvanbenschoten, I'm planning on tacking on a third commit that removes the use of getNodeDescriptor
on here, but other than that this should be good for a look.
TFTRs!
bors r+