Decline unsubscribe related command in non-subscribed mode
Now, when clients run the unsubscribe, sunsubscribe and punsubscribe commands in the non-subscribed mode, it returns 0. Indeed this is a bug, we should not allow client run these kind of commands here.
Thus, this PR fixes this bug, but it is a break change for existing clients
Codecov Report
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 70.60%. Comparing base (
b8dd4fb) to head (9002627). Report is 6 commits behind head on unstable.
Additional details and impacted files
@@ Coverage Diff @@
## unstable #759 +/- ##
============================================
- Coverage 70.63% 70.60% -0.03%
============================================
Files 112 112
Lines 61509 61512 +3
============================================
- Hits 43444 43429 -15
- Misses 18065 18083 +18
| Files | Coverage Δ | |
|---|---|---|
| src/server.c | 88.55% <100.00%> (-0.03%) |
:arrow_down: |
Great. We can discuss the exact error message.
I'm not sure we can call it a bug, but it is at least unexpected behaviour when called with more than one channel. I prefer we decribe it as changing an unexpected behaviour rather than a bugfix.
Whatever it is a bug or unexpected behavior, this is a design error. Thus we need to fix it and make the right turn.
This change introduced a race condition related to slot migration of sharded channel and at the same time client sending sunsubscribe, which can lead to the client getting out of sync in matching each reply to each command sent.
When a slot is migrated, the shard-channels are automatically usubscribed and clients will receive a spontaneous sunsubscribed message (push in RESP3) for each channel in the migrated slot. If the client also manually unsubscribes at the same time, the client can believe the spontaneous sunsubscribed response is a response to the SUNSUBSCRIBE command, but later it will receive an extra -NOSUB error after this PR. The client will believe this error is the response to the next command sent, such as GET foo, the -NOSUB error is matched with GET foo and returned as the reply to GET. Now the client's mapping of command-to-response is out of sync.
Scenario diagram:
Client Primary A Primary B
| | |
| SUNSUBSCRIBE ch | |
|------------------------->| |
| +-------------------------+
| | Slot migration A to B |
| [sunsubscribed, ch, 0] +-------------------------+
|<-------------------------| |
| | |
| Error... | |
|<-------------------------| |
| | |
An error reply is always in-band, i.e. it is always for a command.
If a client ignores errors for a command like subscribe and unsubscribe, then these commands get no in-band replies and one or more push replies. But there is an error, then subscribe/unsibscribe commands receives an in-band reply. Therefore, a correct client which sends [S|P][UN]SUBSCRIBE it must wait for either one push reply to confirm 'subscribed' or 'unsubscribed' or an error, before it can know that the command has been handled. If there are multiple channels, the remaining messages can just be handled as out-of-band.
But the above scenario with -NOSUB error breaks this logic. Now, there is no way for a client to make sure it stays in sync.
Therefore, I think we should maybe revert this PR. Without this PR, the client will instead in this scenario receive two 'sunsubscribed' push messages. Two push messages are easy to handle without getting out of sync.
As stated before this breaks clients, one example is hiredis when running with the async API.
In async, hiredis waits for a successful reply to know the servers state and how to handle reply callbacks.
The handling in hiredis is a bit flawed from start, but now it would need to determine the result of an unsubscribe to avoid getting out of sync.
I made a test case to verify this. Actually, the corner case exists since before, ever since we added the sponaneous sunsubscribe message (when sharded pubsub was introduced?). After moving a slot and the client has received a sunsubscribe message, the SUNSUBSCRIBE command doesn't get a NOSUB error but a MOVED error, or a CLUSTERDOWN if the slot was deleted.
I simulated it with the following scenario, where Client 1 can be assumed to not have read the first message when it sends the SUNSUBSCRIBE command. Client 1 then believes that the [sunsubscribe, ch, 0] push message is received as a response to SUNSUBSCRIBE. The CLUSTERDOWN error reply remains to be read and it appears to be out of sync, i.e. to the client it doesn't appear to match a command it has sent. (If the client has sent another command in the pipeline, the CLUSTERDOWN appears to be its reply.)
Client 1 Client 2 Primary
| | DELSLOT |
| |---------------->|
| | OK |
| [sunsubscribe, ch, 0] |<----------------|
|<-------------------------------------------|
| | |
| SUNSUBSCRIBE ch | |
|------------------------------------------->|
| -CLUSTERDOWN | |
|<-------------------------------------------|
| | |
I don't see any way for a client to protect itself against this scenario. @bjosv do you?
No, I don't think there is a way to protect against this currently..