arq
arq copied to clipboard
PoC: Redis Streams for immediate task delivery
This pull request introduces a proof of concept for using Redis Streams to improve immediate task delivery in ARQ. The current ARQ implementation faces significant scaling challenges, especially in high-load, distributed environments.
Current Problem:
When handling large volumes of small, quick tasks (e.g., 300 tasks/sec) across multiple workers (10-30+), ARQ’s task distribution model struggles with:
- Locking Mechanism: On each poll iteration, ARQ attempts to acquire locks for all tasks in the queue, leading to a huge number of Redis calls.
- High Redis Load: Excessive polling generates unnecessary load on Redis, impacting both performance and scalability.
- Non-Linear Scaling: Compared to Celery, which can handle similar workloads using only ~10% CPU, ARQ’s CPU usage spikes to ~70% while still hitting scaling limits.
Proposed Solution:
This pull request explores how Redis Streams can be used for immediate task delivery to address these inefficiencies:
- Race Condition Handling: Redis Streams naturally solve race conditions out of the box, eliminating the need for ARQ’s current locking mechanism.
- Reduced Redis Calls: Task acquisition becomes significantly more efficient, reducing the load on Redis and CPU utilization.
- Near-Instant Task Fetching: By avoiding polling, tasks are fetched and delivered almost immediately as they are published.
Backwards Compatibility (almost):
- This implementation only applies to tasks that are ready for immediate execution.
- Delayed tasks will continue to use the existing sorted sets approach.
- The same Redis keys are reused, ensuring smooth upgrades and compatibility with existing ARQ deployments.
- Warning: This PR breaks compatibility with Redis v5 since it uses the XAUTOCLAIM command, which significantly simplifies implementation by avoiding the addition of locks. This command was introduced in Redis 6.2.0. https://redis.io/docs/latest/commands/xautoclaim/ (but potentially it can be replaced with LUA script that does XPENDING + XCLAIM)
Key Benefits:
- Increased Performance and Scalability: Benchmarks demonstrate significant improvements in task delivery performance and overall system scalability.
- Lower CPU Utilization: By minimizing redundant Redis calls, CPU usage is greatly reduced under high-load scenarios on average.
- Immediate Task Delivery: Tasks are fetched in real-time without relying on polling.
- Aligned with ARQ’s Future Vision: This approach integrates seamlessly with ARQ’s roadmap for performance optimization. It implements what was mention in "Future plan for Arq" document https://github.com/python-arq/arq/issues/437 (2. Redis Streams 🚀, 3. Avoid sorted set for immediate jobs, 4. Avoid polling)
This PoC PR that demonstrates that this approach indeed possible and works. After sync with maintainers, this PR can be finished.
Benchmark results
They are not quite reliable as it was tested on a laptop with Redis in a Docker container. But here are some numbers:
Tasks with no delay
Current implementation
===============
Published tasks in 2.71 seconds
===============
Starting 1 workers with 25 max jobs
Done 20000 tasks in 80.73 seconds
===============
Published tasks in 2.75 seconds
===============
Starting 10 workers with 25 max jobs
Done 20000 tasks in 80.40 seconds
===============
Published tasks in 2.53 seconds
===============
Starting 20 workers with 25 max jobs
Done 20000 tasks in 91.04 seconds
===============
Published tasks in 3.50 seconds
===============
Starting 40 workers with 25 max jobs
Done 20000 tasks in 133.40 seconds
===============
Redis streams implementation
===============
Published tasks in 3.13 seconds
===============
Starting 1 workers with 25 max jobs
Done 20000 tasks in 31.66 seconds
===============
Published tasks in 2.65 seconds
===============
Starting 10 workers with 25 max jobs
Done 20000 tasks in 7.49 seconds
===============
Published tasks in 2.74 seconds
===============
Starting 20 workers with 25 max jobs
Done 20000 tasks in 6.54 seconds
===============
Published tasks in 2.65 seconds
===============
Starting 40 workers with 25 max jobs
Done 20000 tasks in 6.24 seconds
===============
Tasks with 1 second delay
Current implementation
===============
Published tasks in 2.51 seconds
===============
Starting 1 workers with 25 max jobs
Done 20000 tasks in 80.25 seconds
===============
Published tasks in 2.64 seconds
===============
Starting 10 workers with 25 max jobs
Done 20000 tasks in 79.59 seconds
===============
Published tasks in 2.56 seconds
===============
Starting 20 workers with 25 max jobs
Done 20000 tasks in 85.00 seconds
===============
Published tasks in 2.48 seconds
===============
Starting 40 workers with 25 max jobs
Done 20000 tasks in 127.51 seconds
===============
Redis streams implementation
===============
Published tasks in 2.50 seconds
===============
Starting 1 workers with 25 max jobs
Done 20000 tasks in 80.49 seconds
===============
Published tasks in 2.75 seconds
===============
Starting 10 workers with 25 max jobs
Done 20000 tasks in 26.39 seconds
===============
Published tasks in 2.52 seconds
===============
Starting 20 workers with 25 max jobs
Done 20000 tasks in 20.30 seconds
===============
Published tasks in 2.71 seconds
===============
Starting 40 workers with 25 max jobs
Done 20000 tasks in 15.98 seconds
===============
:warning: Please install the to ensure uploads and comments are reliably processed by Codecov.
Codecov Report
Attention: Patch coverage is 91.13924% with 14 lines in your changes missing coverage. Please review.
| Files with missing lines | Patch % | Lines |
|---|---|---|
| arq/worker.py | 90.47% | 6 Missing and 4 partials :warning: |
| arq/connections.py | 84.61% | 2 Missing and 2 partials :warning: |
:exclamation: Your organization needs to install the Codecov GitHub app to enable full functionality.
@@ Coverage Diff @@
## main #492 +/- ##
==========================================
- Coverage 96.27% 94.67% -1.61%
==========================================
Files 11 12 +1
Lines 1074 1202 +128
Branches 209 218 +9
==========================================
+ Hits 1034 1138 +104
- Misses 19 36 +17
- Partials 21 28 +7
| Files with missing lines | Coverage Δ | |
|---|---|---|
| arq/constants.py | 100.00% <100.00%> (ø) |
|
| arq/jobs.py | 97.19% <100.00%> (-0.97%) |
:arrow_down: |
| arq/lua.py | 100.00% <100.00%> (ø) |
|
| arq/connections.py | 86.51% <84.61%> (-3.55%) |
:arrow_down: |
| arq/worker.py | 94.99% <90.47%> (-2.19%) |
:arrow_down: |
Continue to review full report in Codecov by Sentry.
Legend - Click here to learn more
Δ = absolute <relative> (impact),ø = not affected,? = missing dataPowered by Codecov. Last update 7a911f3...ba46d49. Read the comment docs.
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
We tried this out in production 😬 😅 and we were able to reduce wait time of jobs massively. We recently scaled to many thousandsof jobs per minute and found that there was fairly significant time between when a job was queued and when it would be started, up to 15 seconds! Here you can see the changed after using this PR.
@samuelcolvin @chrisguidry hey hey folks 👋 It seems that performance improvements are highly requested by Arq users. What do you think about separating pull requests for performance improvements and Arq refactoring to finish and merge this branch sooner, then separately focus on Arq refactoring? This way, we could dedicate more time to Arq refactoring while also addressing critical performance issues important to some users
There seems to be something wrong with connection retrying/handling with this PR.
When I restart redis (docker restart dev-redis-1 for example) I'm getting redis.exceptions.ConnectionError: Connection closed by server.
I'm also getting same exception if I specify --watch for auto reload during development - arq restarts anyway, but logs this error.
@samuelcolvin @chrisguidry hey hey folks 👋 It seems that performance improvements are highly requested by Arq users. What do you think about separating pull requests for performance improvements and Arq refactoring to finish and merge this branch sooner, then separately focus on Arq refactoring? This way, we could dedicate more time to Arq refactoring while also addressing critical performance issues important to some users
Would be fantastic to get a performance-related release to be able to access these major improvements right away, even if the rewrite takes a while!