arq
arq copied to clipboard
Add Redis Streams option for job delivery
This pull request adds a basic implementation of Redis Streams, in order to avoid polling for new jobs in the worker and reduce latency, in accordance with objective 4 of issue #437.
To create a worker that listens to a Redis Stream, we can use the cli or specify it in the code directly.
CLI:
arq worker.WorkerSettings --stream
Code:
class WorkerSettings:
functions = [...]
stream = True
...
On the client, they must specify that they want to deliver a job to a worker through a Redis Stream.
redis = await create_pool(RedisSettings())
await redis.enqueue_job('hello_world', _use_stream=True)
Here are the results of a very simple benchmark that showcases the potential of using Redis Streams for improved latency.
Polling:
Average time: 0.268s
Streaming:
Average time: 0.012s
Codecov Report
Attention: Patch coverage is 82.92683% with 7 lines in your changes missing coverage. Please review.
Project coverage is 95.93%. Comparing base (
94cd878) to head (5747d48). Report is 11 commits behind head on main.
Additional details and impacted files
@@ Coverage Diff @@
## main #451 +/- ##
==========================================
- Coverage 96.27% 95.93% -0.35%
==========================================
Files 11 11
Lines 1074 1107 +33
Branches 209 199 -10
==========================================
+ Hits 1034 1062 +28
- Misses 19 23 +4
- Partials 21 22 +1
| Files | Coverage Δ | |
|---|---|---|
| arq/connections.py | 90.06% <100.00%> (-0.01%) |
:arrow_down: |
| arq/constants.py | 100.00% <100.00%> (ø) |
|
| arq/cli.py | 96.49% <60.00%> (-3.51%) |
:arrow_down: |
| arq/worker.py | 96.50% <83.33%> (-0.67%) |
:arrow_down: |
Continue to review full report in Codecov by Sentry.
Legend - Click here to learn more
Δ = absolute <relative> (impact),ø = not affected,? = missing dataPowered by Codecov. Last update 1315583...5747d48. Read the comment docs.
@ajac-zero This may need a unit-test with stream enabled.
Sure thing @gaby . I was wondering how I should go about that...
What I started doing was add a stream parameter to the worker tests, and then wrap them so they run twice, once with stream and once without.
But I feel this might not be the best way to do things, maybe I should focus on some vital tests? What do you suggest?
@ajac-zero That's probably a great starting point, running current tests with "stream" set to false. Then running the test suite with "stream" set to True. This will require setting the source of the data to use Streams.
@gaby I finally got around to writing the unit tests. I added a new stream_worker pytest fixture and used pytest parametrize to basically run all the worker test suite twice, first with polling and then with streaming. All passing 😸.
This looks like great work, but I wonder if it will add complexity to the migration/rewrite required by #437.
Given the size of the change required by #437, I think I'll work on a clean-room rewrite, then add compatibility shims for the existing methods — this might make that work more complicated.
@samuelcolvin Thanks! Yes, it might be better to wait and build on top of the new version.
I'm really looking forward to the new changes. The DAG and type safety sound awesome. Lmk if I can contribute in some way.