standalone REDIRECT: Fix scripting and further MULTI/EXEC scenarios
In cluster mode, the necessity of a "MOVED" response is mainly determined based on the keys of a command whereas in standalone redirect mode, the necessity of a "REDIRECT" response is determined based on the command's flags.
It turns out that looking at keys is necessary. Moreover, the currently used command flags do not encompass all situations in which a REDIRECT is necessary:
- WATCH command (redirect in READWRITE mode is necessary)
- transactions with no keys must not be redirected
- scripting scenarios: scripts (with keys) must be redirected in some situations (issue #868).
The new logic is a heavily "distilled" version of getNodeByQuery used in cluster mode. As we don't need to look at slots and their consistency, the information necessary is the flags of the command and whether it accesses at least one key.
Fixes #868
Codecov Report
:white_check_mark: All modified and coverable lines are covered by tests.
:white_check_mark: Project coverage is 72.29%. Comparing base (6cbc1a3) to head (7b98b27).
Additional details and impacted files
@@ Coverage Diff @@
## unstable #1781 +/- ##
============================================
- Coverage 72.43% 72.29% -0.15%
============================================
Files 128 128
Lines 70221 70248 +27
============================================
- Hits 50863 50784 -79
- Misses 19358 19464 +106
| Files with missing lines | Coverage Δ | |
|---|---|---|
| src/server.c | 88.57% <100.00%> (+0.11%) |
:arrow_up: |
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
- :package: JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.
WATCH command (redirect in READWRITE mode is necessary)
This makes sense.
transactions with no keys must not be redirected
I don't understand, this already aligns with the current behavior.
scripting scenarios: scripts (with keys) must be redirected in some situations
Can you elaborate it? EVAL and EVAL_RO would or not be redirected in different scenarios?
transactions with no keys must not be redirected
I don't understand, this already aligns with the current behavior.
Mostly yes, I suppose, except when a script with no keys is used, see test case "replica allows MULTI that accesses no key". So, I think we need to look at the keys the commands within a transaction use, not only at the flags of those commands.
scripting scenarios: scripts (with keys) must be redirected in some situations
Can you elaborate it?
EVALandEVAL_ROwould or not be redirected in different scenarios?
The proposal is to behave like in cluster mode, which basically means that EVAL and EVAL_RO do not behave differently when it comes to redirects (I think this is because EVAL_RO is rather new. If one does not have EVAL_RO, it makes sense for EVAL to assume that keys are used for reading in READONLY mode.) Thus:
In READWRITE mode on replica:
- Script without key is not redirected
- Script with keys is redirected (both
EVALandEVAL_RO)
In READONLY mode on replica:
-
Script without keys is not redirected
-
Script with keys is not redirected (both
EVALandEVAL_RO)There is a difference in the reported error, though, if the script actually tries to write. For
EVAL_ROthis isERR Write commands are not allowed from read-only scripts. And forEVALthis isREADONLY You can't write against a read only replica. See also the corresponding discussion for cluster mode at https://github.com/valkey-io/valkey/issues/869#issuecomment-2563203949.
WATCH command (redirect in READWRITE mode is necessary)
This makes sense.
@soloestoy Why do you think it makes sense? I think using watch on the replicas also makes perfect sense as the watch also protects mutations arriving on the replication stream right? I think it does not make sense to redirect watch command, but maybe I misunderstood the intension here? @gmbnomis
WATCH command (redirect in READWRITE mode is necessary)
@soloestoy Why do you think it makes sense? I think using watch on the replicas also makes perfect sense as the watch also protects mutations arriving on the replication stream right? I think it does not make sense to redirect watch command, but maybe I misunderstood the intension here? @gmbnomis
Yes, WATCH on a replica must be possible, but then one needs to enable READONLY for the connection in redirect mode (exactly like in cluster mode).
WATCH command (redirect in READWRITE mode is necessary)
@soloestoy Why do you think it makes sense? I think using watch on the replicas also makes perfect sense as the watch also protects mutations arriving on the replication stream right? I think it does not make sense to redirect watch command, but maybe I misunderstood the intension here? @gmbnomis
Yes,
WATCHon a replica must be possible, but then one needs to enableREADONLYfor the connection in redirect mode (exactly like in cluster mode).
Not sure why I need to be READONLY for that. although rare, writable replicas are still supported AFAIK. Also although I agree most known clients will issue READONLY on all replica in cluster mode disabled when rfr is used, AFAIK the READONLY was mainly introduced as a cluster command and less intended for serving cluster mode disabled. At the very least, this should be considered as a breaking change for clients.
Also what is the expected result of responding with REDIRECT to WATCH? should the client redirect it to the primary node and continue processing the transaction on the replica? I am not sure what is the proposal in this case.
Yes,
WATCHon a replica must be possible, but then one needs to enableREADONLYfor the connection in redirect mode (exactly like in cluster mode).Not sure why I need to be READONLY for that. although rare, writable replicas are still supported AFAIK.
There is no change in the behavior of WATCH (or any other command accessing keys) unless the client enables
redirect mode on the connection:
127.0.0.1:6380> readwrite
OK
127.0.0.1:6380> watch a
OK
127.0.0.1:6380> readonly
OK
127.0.0.1:6380> watch a
OK
127.0.0.1:6380> client capa redirect
OK
127.0.0.1:6380> watch a
OK
127.0.0.1:6380> readwrite
OK
127.0.0.1:6380> watch a
(error) REDIRECT 127.0.0.1:6379
I think that a client states it intent that it wants to be redirected to the primary when it enables redirect mode.
Also although I agree most known clients will issue READONLY on all replica in cluster mode disabled when rfr is used, AFAIK the READONLY was mainly introduced as a cluster command and less intended for serving cluster mode disabled. At the very least, this should be considered as a breaking change for clients. Also what is the expected result of responding with REDIRECT to
WATCH? should the client redirect it to the primary node and continue processing the transaction on the replica? I am not sure what is the proposal in this case.
The use of readonly and readwrite has been part of redirect mode from the beginning, see https://github.com/valkey-io/valkey/pull/325. So, this PR does not propose any new handling of readonly/readwrite in that mode (or any change on the client supporting redirect mode). It just fixes a couple of scenarios in which we should redirect (taking cluster mode as the reference)
As for a redirected WATCH, the entire transaction should take place on the primary, then.
Yes,
WATCHon a replica must be possible, but then one needs to enableREADONLYfor the connection in redirect mode (exactly like in cluster mode).Not sure why I need to be READONLY for that. although rare, writable replicas are still supported AFAIK.
There is no change in the behavior of
WATCH(or any other command accessing keys) unless the client enables redirect mode on the connection:127.0.0.1:6380> readwrite OK 127.0.0.1:6380> watch a OK 127.0.0.1:6380> readonly OK 127.0.0.1:6380> watch a OK 127.0.0.1:6380> client capa redirect OK 127.0.0.1:6380> watch a OK 127.0.0.1:6380> readwrite OK 127.0.0.1:6380> watch a (error) REDIRECT 127.0.0.1:6379I think that a client states it intent that it wants to be redirected to the primary when it enables redirect mode.
Also although I agree most known clients will issue READONLY on all replica in cluster mode disabled when rfr is used, AFAIK the READONLY was mainly introduced as a cluster command and less intended for serving cluster mode disabled. At the very least, this should be considered as a breaking change for clients. Also what is the expected result of responding with REDIRECT to
WATCH? should the client redirect it to the primary node and continue processing the transaction on the replica? I am not sure what is the proposal in this case.The use of readonly and readwrite has been part of redirect mode from the beginning, see #325. So, this PR does not propose any new handling of readonly/readwrite in that mode (or any change on the client supporting redirect mode). It just fixes a couple of scenarios in which we should redirect (taking cluster mode as the reference)
As for a redirected
WATCH, the entire transaction should take place on the primary, then.
I understand your point (forgot that -REDIRECT will not work for writable replicas per a decision we took in #325) however maintaining the watch migrations not trivial IMO. For example say we redirect the watch and later there will be a failover switch between the primary and replica and the transaction will then be executed on the other node? I think it would be difficult to control how transactions are being handled by clients and maybe we should discuss it with client maintainers in order to understand what is the preferred way to operate transactions redirections.
@gmbnomis Also I agree that this is not an issue only with your current proposal. it is also an issue for cluster mode redirecting watch. I think there was an issue that point was discussed and we thought of introducing a new way to start a transaction with watches or something
@gmbnomis Also I agree that this is not an issue only with your current proposal. it is also an issue for cluster mode redirecting watch. I think there was an issue that point was discussed and we thought of introducing a new way to start a transaction with watches or something
Anyhow, now that I understand better the suggestion I am fine with us only keeping better alignment with cluster mode MOVED. Thank you for taking the time to explain!
@soloestoy @ranshid I rebased the PR on the current unstable. This bug is now open since 8.0: I really would like to make some progress here (this bug is the reason why we do not even consider looking at using REDIRECT mode for our applications, as we use a lot of Lua scripts).