dragonfly
dragonfly copied to clipboard
Dragonfly does not work with nest/bull
See this https://github.com/BlackYuzia/Dragonfly-Bull to reproduce.
- Dragonfly rejects EVAL under MULTI (which could potentially be fixed by making the multi TX global - dynamically)
- We also do not allow undeclared keys and this library uses undeclared keys extensively. See, for example, https://github.com/OptimalBits/bull/blob/develop/lib/commands/addJob-6.lua
See where I think EVAL is called under multi (but I am not sure): https://github.com/OptimalBits/bull/blob/master/lib/job.js#L88
First reported in #781
@romange on some occasions we use multi and make a bunch of EVAL/EVALSHA calls, for example when adding jobs in a batch. I think Bull and BullMQ are probably the libraries that do more crazy stuff with lua scripts, if all Bull/BullMQ tests pass with Dragonfly I think you can be quite confident that your LUA script support is 100% Redis compatible 😅
Thanks @manast ! If it's not too hard, can you please write here (or point me to) how to run these tests? Also, where to override "Redis" backend in these tests (to inject dragonfly instead).
It is quite straightforward, just clone the package, install dependencies with yarn, and then run yarn test
. The tests will use whatever Redis instance is running locally (it will connect to localhost:6379). I recommend you start with BullMQ as it has more features (and more intricate lua scripts): https://github.com/taskforcesh/bullmq
Thank you, Manuel!
On Thu, Feb 16, 2023 at 3:03 PM Manuel Astudillo @.***> wrote:
It is quite straightforward, just clone the package, install dependencies with yarn, and then run yarn test. The tests will use whatever Redis instance is running locally (it will connect to localhost:6379). I recommend you start with BullMQ as it has more features (and more intricate lua scripts): https://github.com/taskforcesh/bullmq
— Reply to this email directly, view it on GitHub https://github.com/dragonflydb/dragonfly/issues/782#issuecomment-1433056753, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA4BFCFKADT5KL5RQBJUUSDWXYQRVANCNFSM6AAAAAAUZQH7BU . You are receiving this because you were mentioned.Message ID: @.***>
discussion for reference, https://github.com/taskforcesh/bullmq/discussions/1689
I tried to run a simple test with dragonfly (in intel MacOS) and just trying to add a job to a queue (the simplest operation in BullMQ) will give this error:
Error: write EPIPE
at afterWriteDispatched (node:internal/stream_base_commons:160:15)
at writeGeneric (node:internal/stream_base_commons:151:3)
at Socket._writeGeneric (node:net:917:11)
at Socket._write (node:net:929:8)
at writeOrBuffer (node:internal/streams/writable:392:12)
at _write (node:internal/streams/writable:333:10)
at Writable.write (node:internal/streams/writable:337:10)
at EventEmitter.sendCommand (/Users/manuelastudillo/Dev/taskforce/bullmq/node_modules/ioredis/built/Redis.js:394:29)
at /Users/manuelastudillo/Dev/taskforce/bullmq/node_modules/ioredis/built/redis/event_handler.js:266:26
at /Users/manuelastudillo/Dev/taskforce/bullmq/node_modules/ioredis/built/redis/event_handler.js:69:51 {
errno: -32,
code: 'EPIPE',
syscall: 'write'
}
(running v1.3.0 of dragonfly)
I did some more tests and they fail because some Redis commands are not implemented yet: XREAD, XTRIM, etc.
I just encountered this today when trying to use dragonfly instead of redis with immich :(
https://github.com/immich-app/immich/issues/2542
I also really don't mean to be pedantic, but the compatibility claim is very misleading.
Fully compatible with Redis and Memcached APIs, Dragonfly requires no code changes to adopt.
https://www.dragonflydb.io/redis-alternative
Fully compatible with Redis Dragonfly is a drop-in Redis replacement, meaning it uses the same APIs and is compatible with all of the same SDKs and tooling. Teams that switch from Redis to Dragonfly get huge performance gains and simpler system to operate, all without changing code.
Don't get me wrong, Dragonfly is great and I'm keen to use it, but I do feel mislead.
Hello @uhthomas . Sorry about that! we have implemented around 200 commands that are fully compatible. These docs expand on what commands are supported so far: https://www.dragonflydb.io/docs/category/command-reference
Lua scripts and specifically job management frameworks are still not cracked fully yet, but we are working on completing this functionality. You are right that we should do a better job at explaining the compatibility gap.
@uhthomas
Hi. We just newly support this behavior, but don't have yet an explanation for it.
Dragonfly by default disallows accessing undeclared keys in Lua scripts.
You can enable it with the following flag: dragonfly --default_lua_config=allow-undeclared-keys
.
@dranikpg is correct, but I think we should perform more test coverage before we advice bull users to try Dragonfly.
Thanks for following up. I look forward to using dragonfly at some point in the future.
200 commands
A naive look at https://redis.io/commands/ suggests there are 470 commands in total? Is this correct?
document.querySelectorAll('article').length
470
No, it's not correct. That includes "Redis Stack", i.e. including Redis Modules. For Redis OSS (v7):
@dranikpg @romange it's not perfect yet (there may be some performance optimizations that need to be made?), but I can confirm that DragonflyDB with the --default_lua_flags=allow-undeclared-keys
flag is working in production used with bull and bull-board.
Did you mean --default_lua_flags=allow-undeclared-keys
maybe?
Large report of probably non-issue, good for context (click to reveal)
Interestingly, I keep getting stall errors as of https://github.com/dragonflydb/dragonfly/commit/cde6adb34d1dd016a03cdb875bf14ba2311d9244 , where *every* job is being pushed into waiting and nothing is being processed with this error:0|Calckey_social | WARN 6 [queue deliver] failed(Error: job stalled more than allowable limit) id=16664 attempts=0/10 age=1632m to=https://xxx.xxx/inbox
(where xxx.xxx is each instance)
I cleared the queue. Reverting back to e3086f56a106ef4d7d2da4cc3748127a93def69e works fine, but updating again to cde6adb34d1dd016a03cdb875bf14ba2311d9244 produces a few of generic queue errors (i.e. [queue inbox] error Error processing job
), but does seem to work okay overall.
Pure guess: do you use a debug build? Perhaps https://github.com/dragonflydb/dragonfly/commit/753f7fcdbcb013838501f8f4a772a2f4cc4d14a4 fixes this issue?
No, I use the build from source method to make a release build from the latest git commit, and I already have that commit in the build.
Not sure if its related to those Error processing job
issues - but running https://github.com/taskforcesh/bullmq tests with the above fixes theres still quite a few failures - will keep looking
Edit: Lots of tests seem to be just timing out rather than getting errors - though some like when there are more added jobs than max limiter
are getting Error: write EPIPE
and failing even with the timeout removed (running release build of ff1ffa0
)
I wonder if I should migrate from bull to bullmq for additional testing, I'm not sure how many people use bull compared to bullmq :thinking:
I'm noticing some interesting graph patterns. Normally, Redis would have jobs constantly in delayed/processing and have a steady stream of active jobs, and not have repeating patterns like this.
Over the last couple of days, here are the biggest things I've noticed when comparing how Redis operates with Bull vs Dragonfly with the exact same limits/configs:
- With Dragonfly, jobs tend to get stuck in waiting and not get processes, leading to massive backlogs
- With Dragonfly, jobs tend to go from process to active as if they're handing it off to each other, compared to Redis which handles them both simultaneously
- With Dragonfly, jobs tend to get delayed a lot more for seemingly no reason
- Dragonfly does seem to have the potential to handle far more Process and Active jobs simultaneously than Redis
Thank you for your observations. However, Dragonfly still does not support Bull, so I would not jump to conclusions. I hope we will be able to learn in the coming weeks what's missing in Dragonfly to make it compatible with Bull. The effort will take some time :)
Exactly, and I'm aware. Just happy to share my observations to help see what behaviors occur and if they end up being fixed with updates :)
Running the BullMQ tests again with the above fixes, theres still 5/388 tests failing:
- Delayed jobs:
should process delayed jobs with several workers respecting delay
- Not waiting the expected delay (
waited at least delay time
) - Note finding this fails with Redis too when running locally but on a different assertion
- Edit: Actually the
waited at least delay time
can fail when using Redis too, though less often. I think its due to Lua scripts being run with a timestamp set on the client before the request is made, but concurrent scripts from different workers could run out of order so a script with timestamp T2 could run before another script with timestamp T1, which can occur in Redis but seems more likely with Dragonfly. In this case seeing a delayed job being moved to the wait queue at time T2 (where T2 > delay timestamp), but then the jobs 'processed on' is set to T1 (where T1 < delay timestamp). Theres only a few ms difference but thats enough for the assertion to fail, which checks 'processed on' is less than the delay timestamp
- Not waiting the expected delay (
- Job:
removes 4000 jobs in time range of 4000ms
- Times out after 4 seconds
- Increasing the timeout this often passes (in ~6 seconds compared to Redis running in ~400ms), though it also sometimes gets a
write EPIPE
error
- Obliterate:
should obliterate a queue with high number of jobs in different statuses
- Times out after 6 seconds
- Increasing the timeout it gets
ECONNRESET
errors but retries and eventually passes in over 15 seconds
- Pause:
should not process delayed jobs
- Sometimes passes, though often fails with an assertion error
- Edit: I think this is similar to
should process delayed jobs with several workers respecting delay
where the order concurrent scripts run affects whether the test passes or fails. If the workermoveToActive
runs before the job is added, the test passes, but if it runs after the job is added, the test fails (adding smallawait delay
s can cause Redis to fail and DF to pass)
- Rate Limiter:
processes jobs as max limiter from the beginning
- Receives
ECONNRESET
errors and times out - Removing the timeout it still gets multiple
ECONNRESET
errors then hangs
- Receives
Looking briefly at the ECONNRESET
errors, they seem to occur with both epoll and io_uring. Adding logs to the server, it only closes the connection after recv
returns 0
. tcpdump shows there are no FIN
packets before the RST
, just data packets sent from the client to the server then the server responds with the RST
. Not sure if I’ve configured something wrong, will keep looking
@andydunstall I guess this is the logic that leads to ECONNRESET errors: https://github.com/romange/helio/blob/master/util/fibers/epoll_socket.cc#L347
basically we get 0 from recv, and return connection-aborted.
The logic above the call probably just closes the socket without going through the shutdown
sequence.
And it may indeed be wrong, but do you know that the client code indeed shut down the connection before dragonfly received 0? I wonder if the problem is ECONNRESET or Dragonfly finds itself out of sync with bull client where Dragonfly reads from its socket when it should not...