valkey standalone REDIRECT: Fix scripting and further MULTI/EXEC scenarios

In cluster mode, the necessity of a "MOVED" response is mainly determined based on the keys of a command whereas in standalone redirect mode, the necessity of a "REDIRECT" response is determined based on the command's flags.

It turns out that looking at keys is necessary. Moreover, the currently used command flags do not encompass all situations in which a REDIRECT is necessary:

WATCH command (redirect in READWRITE mode is necessary)
transactions with no keys must not be redirected
scripting scenarios: scripts (with keys) must be redirected in some situations (issue #868).

The new logic is a heavily "distilled" version of getNodeByQuery used in cluster mode. As we don't need to look at slots and their consistency, the information necessary is the flags of the command and whether it accesses at least one key.

Fixes #868

Feb 25 '25 21:02 gmbnomis

Codecov Report

:white_check_mark: All modified and coverable lines are covered by tests. :white_check_mark: Project coverage is 72.29%. Comparing base (6cbc1a3) to head (7b98b27).

Additional details and impacted files

@@             Coverage Diff              @@
##           unstable    #1781      +/-   ##
============================================
- Coverage     72.43%   72.29%   -0.15%     
============================================
  Files           128      128              
  Lines         70221    70248      +27     
============================================
- Hits          50863    50784      -79     
- Misses        19358    19464     +106

Files with missing lines	Coverage Δ
src/server.c	`88.57% <100.00%> (+0.11%)`	:arrow_up:

... and 15 files with indirect coverage changes

:rocket: New features to boost your workflow:

:snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
:package: JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Feb 25 '25 21:02 codecov[bot]

WATCH command (redirect in READWRITE mode is necessary)

This makes sense.

transactions with no keys must not be redirected

I don't understand, this already aligns with the current behavior.

scripting scenarios: scripts (with keys) must be redirected in some situations

Can you elaborate it? EVAL and EVAL_RO would or not be redirected in different scenarios?

Mar 24 '25 03:03 soloestoy

transactions with no keys must not be redirected

I don't understand, this already aligns with the current behavior.

Mostly yes, I suppose, except when a script with no keys is used, see test case "replica allows MULTI that accesses no key". So, I think we need to look at the keys the commands within a transaction use, not only at the flags of those commands.

scripting scenarios: scripts (with keys) must be redirected in some situations

Can you elaborate it? EVAL and EVAL_RO would or not be redirected in different scenarios?

The proposal is to behave like in cluster mode, which basically means that EVAL and EVAL_RO do not behave differently when it comes to redirects (I think this is because EVAL_RO is rather new. If one does not have EVAL_RO, it makes sense for EVAL to assume that keys are used for reading in READONLY mode.) Thus:

In READWRITE mode on replica:

Script without key is not redirected
Script with keys is redirected (both EVAL and EVAL_RO)

In READONLY mode on replica:

Script without keys is not redirected
Script with keys is not redirected (both EVAL and EVAL_RO)

There is a difference in the reported error, though, if the script actually tries to write. For EVAL_RO this is ERR Write commands are not allowed from read-only scripts. And for EVAL this is READONLY You can't write against a read only replica. See also the corresponding discussion for cluster mode at https://github.com/valkey-io/valkey/issues/869#issuecomment-2563203949.

Mar 24 '25 23:03 gmbnomis

WATCH command (redirect in READWRITE mode is necessary)

This makes sense.

@soloestoy Why do you think it makes sense? I think using watch on the replicas also makes perfect sense as the watch also protects mutations arriving on the replication stream right? I think it does not make sense to redirect watch command, but maybe I misunderstood the intension here? @gmbnomis

Jun 10 '25 06:06 ranshid

WATCH command (redirect in READWRITE mode is necessary)

@soloestoy Why do you think it makes sense? I think using watch on the replicas also makes perfect sense as the watch also protects mutations arriving on the replication stream right? I think it does not make sense to redirect watch command, but maybe I misunderstood the intension here? @gmbnomis

Yes, WATCH on a replica must be possible, but then one needs to enable READONLY for the connection in redirect mode (exactly like in cluster mode).

Jun 10 '25 07:06 gmbnomis

WATCH command (redirect in READWRITE mode is necessary)

@soloestoy Why do you think it makes sense? I think using watch on the replicas also makes perfect sense as the watch also protects mutations arriving on the replication stream right? I think it does not make sense to redirect watch command, but maybe I misunderstood the intension here? @gmbnomis

Yes, WATCH on a replica must be possible, but then one needs to enable READONLY for the connection in redirect mode (exactly like in cluster mode).

Not sure why I need to be READONLY for that. although rare, writable replicas are still supported AFAIK. Also although I agree most known clients will issue READONLY on all replica in cluster mode disabled when rfr is used, AFAIK the READONLY was mainly introduced as a cluster command and less intended for serving cluster mode disabled. At the very least, this should be considered as a breaking change for clients. Also what is the expected result of responding with REDIRECT to WATCH? should the client redirect it to the primary node and continue processing the transaction on the replica? I am not sure what is the proposal in this case.

Jun 10 '25 07:06 ranshid

Yes, WATCH on a replica must be possible, but then one needs to enable READONLY for the connection in redirect mode (exactly like in cluster mode).

Not sure why I need to be READONLY for that. although rare, writable replicas are still supported AFAIK.

There is no change in the behavior of WATCH (or any other command accessing keys) unless the client enables redirect mode on the connection:

127.0.0.1:6380> readwrite
OK
127.0.0.1:6380> watch a
OK
127.0.0.1:6380> readonly
OK
127.0.0.1:6380> watch a
OK
127.0.0.1:6380> client capa redirect
OK
127.0.0.1:6380> watch a
OK
127.0.0.1:6380> readwrite
OK
127.0.0.1:6380> watch a
(error) REDIRECT 127.0.0.1:6379

I think that a client states it intent that it wants to be redirected to the primary when it enables redirect mode.

Also although I agree most known clients will issue READONLY on all replica in cluster mode disabled when rfr is used, AFAIK the READONLY was mainly introduced as a cluster command and less intended for serving cluster mode disabled. At the very least, this should be considered as a breaking change for clients. Also what is the expected result of responding with REDIRECT to WATCH? should the client redirect it to the primary node and continue processing the transaction on the replica? I am not sure what is the proposal in this case.

The use of readonly and readwrite has been part of redirect mode from the beginning, see https://github.com/valkey-io/valkey/pull/325. So, this PR does not propose any new handling of readonly/readwrite in that mode (or any change on the client supporting redirect mode). It just fixes a couple of scenarios in which we should redirect (taking cluster mode as the reference)

As for a redirected WATCH, the entire transaction should take place on the primary, then.

Jun 10 '25 08:06 gmbnomis

Yes, WATCH on a replica must be possible, but then one needs to enable READONLY for the connection in redirect mode (exactly like in cluster mode).

Not sure why I need to be READONLY for that. although rare, writable replicas are still supported AFAIK.

There is no change in the behavior of WATCH (or any other command accessing keys) unless the client enables redirect mode on the connection:
127.0.0.1:6380> readwrite
OK
127.0.0.1:6380> watch a
OK
127.0.0.1:6380> readonly
OK
127.0.0.1:6380> watch a
OK
127.0.0.1:6380> client capa redirect
OK
127.0.0.1:6380> watch a
OK
127.0.0.1:6380> readwrite
OK
127.0.0.1:6380> watch a
(error) REDIRECT 127.0.0.1:6379
I think that a client states it intent that it wants to be redirected to the primary when it enables redirect mode.

Also although I agree most known clients will issue READONLY on all replica in cluster mode disabled when rfr is used, AFAIK the READONLY was mainly introduced as a cluster command and less intended for serving cluster mode disabled. At the very least, this should be considered as a breaking change for clients. Also what is the expected result of responding with REDIRECT to WATCH? should the client redirect it to the primary node and continue processing the transaction on the replica? I am not sure what is the proposal in this case.

The use of readonly and readwrite has been part of redirect mode from the beginning, see #325. So, this PR does not propose any new handling of readonly/readwrite in that mode (or any change on the client supporting redirect mode). It just fixes a couple of scenarios in which we should redirect (taking cluster mode as the reference)

As for a redirected WATCH, the entire transaction should take place on the primary, then.

I understand your point (forgot that -REDIRECT will not work for writable replicas per a decision we took in #325) however maintaining the watch migrations not trivial IMO. For example say we redirect the watch and later there will be a failover switch between the primary and replica and the transaction will then be executed on the other node? I think it would be difficult to control how transactions are being handled by clients and maybe we should discuss it with client maintainers in order to understand what is the preferred way to operate transactions redirections.

Jun 10 '25 10:06 ranshid

@gmbnomis Also I agree that this is not an issue only with your current proposal. it is also an issue for cluster mode redirecting watch. I think there was an issue that point was discussed and we thought of introducing a new way to start a transaction with watches or something

Jun 10 '25 10:06 ranshid

@gmbnomis Also I agree that this is not an issue only with your current proposal. it is also an issue for cluster mode redirecting watch. I think there was an issue that point was discussed and we thought of introducing a new way to start a transaction with watches or something

Anyhow, now that I understand better the suggestion I am fine with us only keeping better alignment with cluster mode MOVED. Thank you for taking the time to explain!

Jun 10 '25 10:06 ranshid

@soloestoy @ranshid I rebased the PR on the current unstable. This bug is now open since 8.0: I really would like to make some progress here (this bug is the reason why we do not even consider looking at using REDIRECT mode for our applications, as we use a lot of Lua scripts).

Oct 29 '25 19:10 gmbnomis