CLIENT UNBLOCK should't be able to unblock client that doesn't has timeout callback
When a client is blocked by something like CLIENT PAUSE, we should not
allow CLIENT UNBLOCK timeout to unblock it, since some blocking types
does not has the timeout callback, it will trigger a panic in the core,
people should use CLIENT UNPAUSE to unblock it.
Also using CLIENT UNBLOCK error is not right, it will return a UNBLOCKED
error to the command, people don't expect a SET command to get an error.
So in this commit, in these cases, we will return 0 to CLIENT UNBLOCK
to indicate the unblock is fail. The reason is that we assume that if
a command doesn't expect to be timedout, it also doesn't expect to be
unblocked by CLIENT UNBLOCK.
The old behavior of the following command will trigger panic in timeout and get UNBLOCKED error in error. Under the new behavior, client unblock will get the result of 0.
client 1> client pause 100000 write
client 2> set x x
client 1> client unblock 2 timeout
or
client 1> client unblock 2 error
Potentially breaking change, previously allowed CLIENT UNBLOCK error.
Fixes #2111.
Codecov Report
Attention: Patch coverage is 87.50000% with 1 line in your changes missing coverage. Please review.
Project coverage is 71.21%. Comparing base (
36b51c9) to head (4db45f8). Report is 38 commits behind head on unstable.
| Files with missing lines | Patch % | Lines |
|---|---|---|
| src/blocked.c | 85.71% | 1 Missing :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## unstable #2117 +/- ##
============================================
+ Coverage 71.19% 71.21% +0.02%
============================================
Files 122 122
Lines 66046 66053 +7
============================================
+ Hits 47020 47039 +19
+ Misses 19026 19014 -12
| Files with missing lines | Coverage Δ | |
|---|---|---|
| src/networking.c | 87.29% <100.00%> (+0.10%) |
:arrow_up: |
| src/server.h | 100.00% <ø> (ø) |
|
| src/blocked.c | 91.31% <85.71%> (-0.12%) |
:arrow_down: |
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
- :package: JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.
Let's backport it to all versions, ok?
Or only backport the client unblock timeout case where it could crash?
Let's backport it to all versions, ok?
Or only backport the
client unblock timeoutcase where it could crash?
I think cherry picking the exact code would be easier and not a problem. I am not sure why the error is ever used (as I am also not sure when is this API ever used in realtime scenarios and not testing)
I wonder if the change is more severe than it appears at first glance. One could implement the "monitor a changing set of keys" idea from the documentation of CLIENT UNBLOCK as follows:
Loop forever:
BLPOP <current set of observed keys> 0- If not timed out: work on result
And in parallel:
Trigger when set of keys changes:
- Update current set of observed keys
CLIENT UNBLOCK <client id>
Key property today: Client unblocking works even during CLIENT PAUSE, ensuring the key set gets updated on next iteration. (Note that there is no need to even look at the reply of CLIENT UNBLOCK)
The proposed change in this PR creates ambiguity in the CLIENT UNBLOCK response:
"0" reply could now mean either:
- Client wasn't blocked
- Client couldn't be unblocked due to pause (new meaning)
This will break the code above if the trigger occurs when BLPOP is blocking due to CLIENT PAUSE: The BLPOP will continue to wait using the outdated set of keys even if the clients will be unpaused later.
I still think that unblocking the client is wrong (as said by @zuiderkwast CLIENT PAUSE and blocking commands are two different concepts). But maybe CLIENT UNBLOCK should return an error if the command can't be unblocked due to a CLIENT PAUSE? (or should CLIENT UNBLOCK block?)
@gmbnomis Good point. It seems that we should unblock the client if it's blocked and paused at the same time. I.e. it should be possible to unblock a BLPOP even during client pause write. Makes sense?
It seems that we should unblock the client if it's blocked and paused at the same time. I.e. it should be possible to unblock a BLPOP even during client pause write.
That's an interesting perspective. Basically, because pausing clients and blocking commands are different concepts from a user's perspective, we could argue that the user does not need to be concerned with the underlying reason why a blocking command is blocked.
So yes, it makes sense that CLIENT UNBLOCK will just unblock it. And, as this neither reads or writes, this does no harm even if done during a read-write pause. (but I suppose it will be a pain to implement this behavior)
OTOH, my scenario from above turns out to be not fully correct: Today, the server panics when a CLIENT UNBLOCK TIMEOUT is issued for a client that is paused (even if it is a blocking command). Only a CLIENT UNBLOCK ERROR works. So, I wonder if anybody is actually using this pattern...
Today, the server panics when a
CLIENT UNBLOCK TIMEOUTis issued for a client that is paused (even if it is a blocking command).
Yes, I can confirm this. This is an even worse bug IMO. It means the server can crash when you try to CLIENT UNBLOCK a client that is blocking on BLPOP. The user may not know that CLIENT PAUSE is in action, for example during a failover or something like that.
The fix prevents the crash, but unfortunately it prevents unblocking BLPOP while clients are paused, but this depends on which was first: block or pause.
- If a command was blocked before paused, it is possible to unblock it using CLIENT UNBLOCK.
- If clients were paused before a blocking command was called and blocked, it's not possible to unblock it using CLIENT UNBLOCK.
The fix prevents the crash, but unfortunately it prevents unblocking BLPOP while clients are paused, but this depends on which was first: block or pause.
* If a command was blocked before paused, it is possible to unblock it using CLIENT UNBLOCK. * If clients were paused before a blocking command was called and blocked, it's not possible to unblock it using CLIENT UNBLOCK.
@zuiderkwast Yes, this is true. I think that none of the alternatives to this behavior is optimal:
- Unblock clients regardless of pause/block state
- Pros: Consistent unblocking behavior.
- Cons: When clients are paused early in command processing, we lack the necessary context (command type, expected reply) to generate a proper timeout response, especially for module commands. This would likely require a new module API to communicate this information.
- Return an error when paused
- Pros: Simple to implement.
- Cons: Unexpected for clients, as they may not anticipate this error.
- Postpone
CLIENT UNBLOCKof a paused command until the pause expires/is lifted- Pros: From the client’s perspective, this is similar to option 1 but with some delay—in certain cases (like
WAIT), this may be the most logical behavior. - Cons: This introduces a (new?) edge case where one command is deferred by another. We’d need to ensure that commands are unblocked and executed in the correct order, but since blocked commands are already managed in a linked list, this might be feasible.
- Pros: From the client’s perspective, this is similar to option 1 but with some delay—in certain cases (like
Overall, option 3 makes most sense to me.
All options have pros and cons. :thinking: They also require more work, and these cases are very unusual. Maybe there is no use case for these combinations?
IMO we can merge this PR. It is simple and it prevents the crash, which is the most important problem to fix. OK?
@gmbnomis Thank you for your summary comment @valkey-io/core-team Please also take a look at the comments above and vote.
The fix prevents the crash, but unfortunately it prevents unblocking BLPOP while clients are paused, but this depends on which was first: block or pause.
* If a command was blocked before paused, it is possible to unblock it using CLIENT UNBLOCK. * If clients were paused before a blocking command was called and blocked, it's not possible to unblock it using CLIENT UNBLOCK.@zuiderkwast Yes, this is true. I think that none of the alternatives to this behavior is optimal:
Unblock clients regardless of pause/block state
- Pros: Consistent unblocking behavior.
- Cons: When clients are paused early in command processing, we lack the necessary context (command type, expected reply) to generate a proper timeout response, especially for module commands. This would likely require a new module API to communicate this information.
Return an error when paused
- Pros: Simple to implement.
- Cons: Unexpected for clients, as they may not anticipate this error.
Postpone
CLIENT UNBLOCKof a paused command until the pause expires/is lifted
- Pros: From the client’s perspective, this is similar to option 1 but with some delay—in certain cases (like
WAIT), this may be the most logical behavior.- Cons: This introduces a (new?) edge case where one command is deferred by another. We’d need to ensure that commands are unblocked and executed in the correct order, but since blocked commands are already managed in a linked list, this might be feasible.
Overall, option 3 makes most sense to me.
I think that (as I think you also state) the ambiguity was always there. when we pause clients some commands with timeout are not fulfilling the timeout guarantee since it is just too hard to do so. For example we cannot timeout commands which are just sitting in the incoming socket stream as we did not yet parse them.
I think this PR is fine in solving the immediate problem. I would also think that extending CLIENT PAUSE with an extra optional argument (client-id) would be a better choice to extend the API to be more clear and allow unblocking a specific paused client
Let's merge it so it can be included in
- #2173