valkey icon indicating copy to clipboard operation
valkey copied to clipboard

Make cluster replicas return ASK and TRYAGAIN

Open zuiderkwast opened this issue 1 year ago • 3 comments

After READONLY, make a cluster replica behave as its primary regarding returning ASK redirects and TRYAGAIN.

Without this patch, a client reading from a replica cannot tell if a key doesn't exist or if it has already been migrated to another shard as part of an ongoing slot migration. Therefore, without an ASK redirect in this situation, offloading reads to cluster replicas wasn't reliable.

Note: The target of a redirect is always a primary. If a client wants to continue reading from a replica after following a redirect, it needs to figure out the replicas of that new primary using CLUSTER SHARDS or similar.

This is related to #21 and has been made possible by the introduction of Replication of Slot Migration States in #445.

zuiderkwast avatar May 13 '24 23:05 zuiderkwast

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 69.81%. Comparing base (a0aebb6) to head (689ee06).

:exclamation: Current head 689ee06 differs from pull request most recent head b1461ee

Please upload reports for the commit b1461ee to get more accurate results.

Additional details and impacted files
@@             Coverage Diff              @@
##           unstable     #495      +/-   ##
============================================
- Coverage     70.17%   69.81%   -0.37%     
============================================
  Files           109      109              
  Lines         59904    61802    +1898     
============================================
+ Hits          42039    43147    +1108     
- Misses        17865    18655     +790     
Files Coverage Δ
src/cluster.c 86.45% <100.00%> (+0.02%) :arrow_up:

... and 85 files with indirect coverage changes

codecov[bot] avatar May 13 '24 23:05 codecov[bot]

@supercaracal reported this for Redis in September 2022:

Hello,

I'm implementing a client for redis cluster in ruby. https://github.com/redis-rb/redis-cluster-client

I'm trying to test the client under resharding and scale reading conditions. But it seems that replica nodes don't reply ask-redirection error. Clients receive nil from replica nodes while resharding. Is there a way to obtain values of keys in the middle of resharding from replica nodes correctly?

I think this is what you need.

zuiderkwast avatar May 13 '24 23:05 zuiderkwast

Yes, it is. Thank you so much!

supercaracal avatar May 14 '24 00:05 supercaracal

i guess we also need to update https://valkey.io/commands/readonly/

enjoy-binbin avatar May 27 '24 11:05 enjoy-binbin

@enjoy-binbin What do you want to write for the READONLY docs? It already says that the replica can return redirects.

zuiderkwast avatar May 27 '24 13:05 zuiderkwast

ohh, so in the docs, we already says that the replica can return redirects (before the changes)...

enjoy-binbin avatar May 27 '24 13:05 enjoy-binbin

Yes, but maybe we should add a note like "Before Valkey 8, it was not reliable during slot migrations bla bla bla....." WDYT?

zuiderkwast avatar May 27 '24 22:05 zuiderkwast

Yes, but maybe we should add a note like "Before Valkey 8, it was not reliable during slot migrations bla bla bla....." WDYT?

yean, this is a sometime that we can mention.

The cluster was reconfigured (for example resharded) and the replica is no longer able to serve commands for a given hash slot.

btw, in the docs, we says this. This sentence says that "after resharded", the replica can return the redirect, right? and in this PR, replica will return redirect during the slot migrations, right?

enjoy-binbin avatar May 28 '24 03:05 enjoy-binbin

@zuiderkwast ping, i need an ACK in case i misunderstood it

enjoy-binbin avatar Jun 01 '24 05:06 enjoy-binbin

@enjoy-binbin ACK

You're right. We should improve the docs. I think it's almost correct already but we can add point 3.

When the connection is in readonly mode, the cluster will send a redirection to the client only if the operation involves keys not served by the replica’s master node. This may happen because:

  1. The client sent a command about hash slots never served by the master of this replica.
  2. The cluster was reconfigured (for example resharded) and the replica is no longer able to serve commands for a given hash slot.

3. A slot migration is ongoing. In this case the replica can return an ASK redirect or a TRYAGAIN error reply.

OK?

zuiderkwast avatar Jun 01 '24 08:06 zuiderkwast