rfcs
rfcs copied to clipboard
RFC 72: Background workers
Great Work Guys, KEEP GOING PLEASE One NOTE: make it an optional dependency, because currently many developers don't need all features of wagtail that may need background workers or any automated tasks
make it an optional dependency
Completely agree. Wagtail can't be assuming anything on how people are setup. Some hosting environments don't have the ability to run background tasks like this, and we don't want to limit where we can run.
because currently many developers don't need all features of wagtail that may need background workers or any automated tasks
This may be true for the things which are currently run in the background, but as this integration develops, things which currently block web requests will be moved over too, possibly allowing more features if we know they're not time sensitive. It's entirely possible some people won't need the background workers, but I highly doubt that all of those people wouldn't see a benefit from deploying them. Whether that deployment is worthwhile to them is obviously 100% subjective and not something we can assume, again hence making it both optional and opt-in.
I've re-written the implementation chunk of this RFC, specifically to change to more of a backend-based approach rather than having Wagtail implement a lot of it itself or depend fully on Huey. This makes the implementation far more generic and powerful, and reduces the amount of complexity added into Wagtail itself.
Thanks to @tomusher for starting the thoughts around this new direction and initial sketches.
Great work writing this up... and I agree this would be very beneficial to have in Wagtail. While the lowest friction option is to use an existing task scheduler system (Huey, Celery, etc.) those are going to have a couple design challenges that will make this feature harder to adopt.
Data storage: Huey supports SQLite and Redis, celery supports Redis. It would be really nice if the datastore could be the same as the wagtail database to avoid adding another layer of complexity. That means task queuing would ideally be backed by Django models. The database is already the source of truth... there is no reason to require yet another service in the mix.
Task runner process: The hosting provider will most likely have to run a second process (aside from WSGI/server) for the task scheduler. It would be really beneficial if whatever task scheduler backends provided by wagtail have a uniform way of spawning this process from Python, similar to how WSGI works (e.g. Django has a wsgi.py
file which can be run according to a standard interface). Huey-django uses a management command to spawn this process.
Management of this process becomes more complicated with distributed setups (e.g. people running wagtail via docker images, multi-server setups, etc.). Because no one backend would be in charge of running the process, so you'd want a dedicated process server. I'm not saying we need to solve for that scenario, just provide official guidance on how to handle it.
It would be really nice if the datastore could be the same as the wagtail database
The ORM backend will yes use the same database as Wagtail is using to avoid complexity.
The hosting provider will most likely have to run a second process
The original Huey implementation of this RFC allowed both a long running process, and a cron-based "run everything then terminate" style executor. Whilst it's down to whatever task runner a user runs as to how they want to run the actual workers, I think yes the ORM backend in Wagtail should have the ability to run under the cron-style runner so it doesn't depend on a dedicated process. In theory it'd be possible to do things in the same process, but that'd depend strictly on ASGI.
Management of this process becomes more complicated with distributed setups It does, but it's not difficult. If you are running the application inside Docker, then just changing the entrypoint to run say the worker rather than say gunicorn. Making sure the cron jobs or background process runs becomes out of scope for Wagtail to define, so we can be as versatile as possible, however these 2 methods are very standard, and anyone deploying an application which requires background processes will likely already be familiar with running with them.
Discussion thread -> https://github.com/wagtail/wagtail/discussions/7750
(Specific comments on PR still more than welcome!)
Saw this Python library pop up and thought it would be a good link to go here https://rocketry.readthedocs.io/en/stable/
Not saying we should use it but it has a nice API and may serve as some inspiration for parts of this RFC.
I discovered this one as well that is based on Postgres https://procrastinate.readthedocs.io/en/stable/discussions.html#how-does-this-all-work
No additionnel dependencies needed, only Postgres
👋 for anyone interested in this, just wanted to note we’ve provisionally scheduled this on the Wagtail roadmap for version 6.1* (May 2024 release, see RFC 91). I believe @gasman said he was keen to lead on this.
Hello, everyone! Just chiming in with a package suggestion. There is django-q2, which integrates seamlessly with Django and supports multiple backends, including the Django ORM, Redis, etc.
Hi all.
There's a massive appetite for a solution like this in Django itself.
As per a fedi-thread, I'd like to suggest making this RFC into a DEP and pushing on getting it into Django.
From the thread:
Only issue I could see is we're due to work on this very soon.
To which...
Suggestion: Put it in a third party package (maybe with an import shim in Wagtail if you want it easy to install or automatically installed) and lets begin the Merge to Django conversation. Targeting 5.2 or 6.0 gives time for bugs to show, and folks on 4.2+ can use the external package now. That’s the way.
My view is that Django should have had a story here a while back. It's something that should be part of the core framework. That you're putting in the effort here, it's time. We should point that at core, and then make it happen.
I'm happy to help with a DEP and play the shepherd role there if that's of help.
(Also, ref @Tobi-De's comment, Django-Q(2) has a lot of good bits to take inspiration from.)
@carltongibson thanks! :heart: Expect a draft DEP from me in the coming days.
We discussed this in a Wagtail Core Team meeting this morning, and agree it makes sense to create the initial draft package externally to both Wagtail and Django, to allow Wagtail to use it sooner, with the intention to upstream it into Django in parallel.
@RealOrangeOne AWESOME! 🤩
@carltongibson thanks! ❤️ Expect a draft DEP from me in the coming days.
We discussed this in a Wagtail Core Team meeting this morning, and agree it makes sense to create the initial draft package externally to both Wagtail and Django, to allow Wagtail to use it sooner, with the intention to upstream it into Django in parallel.
Simply magnificent, the best thing I've read this morning.
Hey, do you know if this is fully implemented? Ig this was one of the past GSOC ideas. If it's not, can it be done in GSOC 2024?
Hey, do you know if this is fully implemented? Ig this was one of the past GSOC ideas. If it's not, can it be done in GSOC 2024?
This idea has moved to django itself so this framework is going to be developed and shipped as part of django once it's rfc planning is complete and merged
The core implementation of this is now a DEP: https://github.com/django/deps/pull/86.
It's likely Wagtail will be one of the first deployed uses of it, but for now the proposal process and discussion has moved there.
🥳 Django Enhancement Proposal 14: Background Workers | Django Weblog amazing work @RealOrangeOne!
👋 I’m going to close this now, as the RFC has been replaced by a DEP. There’ll be follow-up work needed in Wagtail to leverage those new capabilities but this either won’t require an RFC, or would require one that’s more specific to the use case.
Just to add, a draft PR has been made for a POC of turning Wagtail's mechanisms into tasks for django-tasks: https://github.com/wagtail/wagtail/pull/12040