dragonfly
dragonfly copied to clipboard
Sidekiq Pro's reliable scheduler requires breaking Redis Cluster behavior
Redis Lua requires the user to define all keys touched by a Lua script when invoking that script in order to run correctly in Redis Cluster.
The problem is that Sidekiq Pro's reliable scheduler does not know which queue (list) it will touch next when invoked. And Redis allows us to violate that requirement since Sidekiq does not support Redis Cluster.
The Lua scheduler is a simple loop:
- Take the first job off ZSet "scheduled" where score is < Time.now.to_f.
- Parse the JSON to get the
queue
element. - Push the job onto that named queue.
As you can see, we know the Lua script will touch scheduled
but we don't know the name of the queue because the scheduled set holds future jobs for all queues and the job data itself tells us which queue to mutate.
Unfortunately it looks like DragonflyDB enforces this Lua requirement, making Sidekiq Pro's reliable scheduler feature incompatible. Redis does allow it as long as it is not running in Cluster mode.
@mperham Unfortunately, atomicity requirements do not allow Dragonfly to parallelize arbitrary lua code efficiently. We provide compatibility flags that allow running lua script with undeclared variables but this would kill performance of Dragonfly, since such lua script would run under Global Lock.
Can you please explain semantics of reliable scheduler? Why is there a global "scheduled" set if its items are distributed into different queues anyway? Why not have "scheduled" per queue, for example?
The use of a single, global scheduled
zset is simply one of legacy. That's how it was originally written 10+ years ago before we even had the SCAN commands and it has never changed. There are 4-5 global structures which have proven a pain to scale because they don't have a Cluster-friendly data model.
@mperham Is atomicity required for this script? From your description of the script if the script is not executed from different clients I believe there is no atomicity requirement and if this is the case dragonfly can run a script in non atomic mode which will allow you to not declare the keys in advance.
@adiholden consider the following scenario:
- worker A removes a job from "scheduled" with score 1, then it non-atomically tries to add it to Q1.
- worker B removes a job from "scheduled" with score 2, then it also non-atomically tries to add it to Q1.
- Worker B succeeds first and now the first job in Q1 has score 2, and the next one has score 1.
@romange If this script runs by multiple clients (as in your example different workers), the atomicity is important
@adiholden found this comment https://github.com/sidekiq/sidekiq/blob/80f5f73f8e74a5775866a016fe42446dfc1b861e/docs/internals.md?plain=1#L31
@mperham can you confirm that sidekiq-pro maintains at most one thread that calls this script?
can you confirm that sidekiq-pro maintains at most one thread that calls this script?
One thread per process. You can have many Sidekiq processes, all calling the script concurrently. That's why it's written in Lua.
The workaround that seemed to work for us script flags 351130589c64523cb98978dc32c64173a31244f3 allow-undeclared-keys
We should do it automatically within Dragonfly. No reason not to do it.
@romange Ideally we could add a #!lua flags=allow-undeclared-keys
to SideKiq Pro's scripts directly. This will be future proof for changes in SHAs. Downside is that we'll need to get @mperham's approval, and wait for their next release.
I can do that. I was planning a patch release soon anyways.
I can do that. I was planning a patch release soon anyways.
I synced with @romange offline. If you could do that, for all scripts that (may) access undeclared keys (are there any other than what's discussed here?), that's be great. I'm happy to review if that'd be helpful.
On our side, we'll add the ability to specify SHAs for which we'll use allow-undeclared-keys
, mainly to support users until they upgrade their Sidekiq version
If I add this as the first line of my Lua script:
#!lua flags=allow-undeclared-keys
then I get:
ERR Error compiling script (new function): user_script:1: unexpected symbol near '#' (redis://localhost:6379/14)
when running in Redis. Do you mean:
--#!lua flags=allow-undeclared-keys
seems that Redis prevent using #!lua
notation that it does not recognize. It's rather unfortunate
Are script tags not backwards compatible with Redis then? I can't use it if it doesn't work transparently with Redis.
I believe Redis uses Lua 5.1 and Dragonfly uses Lua 5.4. Was the #
comment style added in the newer versions?
@mperham yeah, it seems to be a miss on our side. Let me get back to you in a few days.
Sorry, I accidentally closed this as done. #3465 only handles the Dragonfly side. We should still apply the preferred solution of annotating on the Sidekiq side. Reopening.
@mperham we've made a decision. We modified our Lua parsing to use a Lua-comment style pragma, which will be fully compatible with Valkey / Redis. See more here: #3517 and #3512
May I kindly ask that you add the following to the beginning of your scripts?
--!df flags=allow-undeclared-keys,disable-atomicity
Support for this will only be released with the next version of Dragonfly, but that should not prevent you from putting these pragmas now :)
Please do let me know if there's any issue. Thanks!
Done and released in Sidekiq Pro 7.3.1.