SQLite queue is using all CPU on high frequency poller (<1s)
When running spiders that do nothing at all, the sqlite based poller uses all cpu just reading scheduled tasks. It would be good to have a plug and play alternative queues like redis.
Related: https://github.com/scrapy/scrapyd/issues/197
Why are you running spiders that "do nothing at all"?
@jpmckinney just to rule out that cpu is being used by a spider. This can be replicated when scheduling a lot of jobs and polling rate is below a second e.g 0.1. SQLite queue will use massive ammount of cpu.
There are also some unmaintained repos that tries to solve this: https://github.com/speakol-ads/scrapyd-redis
Simply the sqlite queue is a really bad option for high frequency queues.
Hmm, yeah, same with https://github.com/Tiago-Lira/scrapyd-mongodb (from which scrapyd-redis is forked) and https://github.com/balena/python-pqueue (mentioned in #197).
https://github.com/peter-wangxu/persist-queue is still active, though maybe a first attempt is to switch to https://github.com/scrapy/queuelib as mentioned in #197.
Can you share your setup for reproducing the issue?
I will try to create a demo later, but it's pretty much can be empty scrapyd service running with 1 spider that does nothing. Then creating like 50 schedules per second and making polling rate 0.1. It will destroy powerful cpu.
Also, in my personal opinion I would say it would make sense to add interface to add your own queue backend instead of doing hacks like those 2 repos mentioned above.
And then later sqlite can be switch to some other default is needed, but having a simple method to replace the queue on your own would be a very good option to quick solve this problem for those who use high frequency polling
Do you have your own queue ready to use? You can try it with this PR: https://github.com/scrapy/scrapyd/pull/476
@jpmckinney thanks, give me a few hours I will try it out.
@pspsdev Now that #476 is merged, do you have suggestions for how to edit the default spider queue, or should there just be a note in the documentation that it doesn't perform well under high frequency polling, and a custom queue would be better?
@jpmckinney I am still doing some tests on my end, give me a few days I will report with more details.