archivy
archivy copied to clipboard
Run long tasks in background
Some of the work archivy does should run asynchronously from the web server:
- importing links from external services (currently pocket, more in the future)
- retrieving a web page's content
Some possible solutions:
Celery
This would add a dependency to either rabbitmq or redis, which might not match your vision for archivy as a simple app. On the other hand, redis might be useful for other stuff (like a cache for a text-search system easier to install than ES)
Python-RQ
(much) More minimalist design, using redis: official website
Addendum, with celery it would be easy to make the rabbitmq/redis dependency optional and run tasks in the flask process when it is not available.
I'd rather not have to use redis, do you know of any more lightweight alternatives? Maybe we could use threads... :thinking:
Agree with keeping it simple for now, we could also use Python3's built-in asyncio
eg.
- https://guillotina.readthedocs.io/en/latest/training/asyncio.html#long-running-tasks
- https://faculty.ai/blog/a-guide-to-using-asyncio/
Yes I think using asyncio is a good idea
I agree!
It seems that flask and werkzeug don't play nice with asyncio, because werkzeug is blocking by design.
- https://pgjones.dev/blog/flask-async-quart-sync-2019/
- https://github.com/pallets/werkzeug/issues/1322
That doesn't mean we cannot use custom coroutines/threading/multiprocessing, though.
https://docs.python.org/3/library/concurrent.futures.html#module-concurrent.futures seems to provide what we need here, any opinions?
https://flask-executor.readthedocs.io/en/latest/ shows how it could be implemented (not necessarily by using that extension directly)
We're about to do this refactor on ArchiveBox too, we're looking at theses 2 queue systems primarily:
- https://github.com/coleifer/huey (supports SQLite as the backing store)
- https://github.com/Bogdanp/dramatiq (requires Redis/RabbitMQ)
There are also adapters that link them to Flask I think (I know there are adapters for Django, should be easy to adapt if no flask-specific ones).
The reason we didn't end up going with asyncio
is because it's still singlethreaded, and there's a decent amount of blocking python that still needs to be run while archiving each link. Archivy's architecture / access patterns may be different though, idk.
I'm rooting hard for Archivy, it looks like you've managed to avoid a lot of the early mistakes that plagued the ArchiveBox codebase, the UI is gorgeous, and your plugin system is awesome. I'd love to share notes/lessons we learned from ours so that you can avoid those pitfalls.
Thanks for the suggestions!
I'm rooting hard for Archivy, it looks like you've managed to avoid a lot of the early mistakes that plagued the ArchiveBox codebase, the UI is gorgeous, and your plugin system is awesome. I'd love to share notes/lessons we learned from ours so that you can avoid those pitfalls.
Yes I'd definitely loved to collaborate, and I remember your comments on the post I made about Archivy on Hacker News, back in August.
Your work with ArchiveBox is really cool :)
Ah yeah sorry I forgot to follow up after I initially commented on HN, got swamped with work. I'll join your discord and we can continue the convo there :)
Cool!