scrapyd icon indicating copy to clipboard operation
scrapyd copied to clipboard

Configurable spider queue class

Open bezkos opened this issue 7 years ago • 3 comments

I have started to implement a custom job queue so i can use a shared job queue with Postgre. I tried to use this setting SPIDER_QUEUE_CLASS but i found out that its missing from Scrapyd. Is it possible to have this setting back cause i want to avoid patch scrapyd code? I think implementation of this feature again, it is important when someone want to use Scrapyd in a multi-server environment.

bezkos avatar Dec 26 '16 12:12 bezkos

I summarize here my reply from the mailing list (only for the record).

The SPIDER_QUEUE_CLASS setting is from the time that scrapyd was part of scrapy not only as a package but as a module. https://github.com/scrapy/scrapy/commit/75e2c3eb338ea03e487907fa8c99bb12317e9435 Unfortunately the release notes do not cover all the details of scrapyd's separation as a module (and it would probably be impractical)

Scrapyd never had this setting.

Digenis avatar Jan 04 '17 10:01 Digenis

It seems easy to implement such a setting.

diff --git a/scrapyd/default_scrapyd.conf b/scrapyd/default_scrapyd.conf
index 0da344f..e2b0c35 100644
--- a/scrapyd/default_scrapyd.conf
+++ b/scrapyd/default_scrapyd.conf
@@ -15,4 +15,5 @@ runner      = scrapyd.runner
 application = scrapyd.app.application
 launcher    = scrapyd.launcher.Launcher
+spiderqueue = scrapyd.spiderqueue.SqliteSpiderQueue
 webroot     = scrapyd.website.Root
 
diff --git a/scrapyd/utils.py b/scrapyd/utils.py
index 602a726..c96add0 100644
--- a/scrapyd/utils.py
+++ b/scrapyd/utils.py
@@ -9,8 +9,11 @@ import json
 from twisted.web import resource
 
-from scrapyd.spiderqueue import SqliteSpiderQueue
+from scrapy.utils.misc import load_object
 from scrapyd.config import Config
 
 
+DEFAULT_SPIDERQUEUE = 'scrapyd.spiderqueue.SqliteSpiderQueue'
+
+
 class JsonResource(resource.Resource):
 
@@ -57,8 +60,9 @@ def get_spider_queues(config):
     if not os.path.exists(dbsdir):
         os.makedirs(dbsdir)
+    spiderqueue = load_object(config.get('spiderqueue', DEFAULT_SPIDERQUEUE))
     d = {}
     for project in get_project_list(config):
         dbpath = os.path.join(dbsdir, '%s.db' % project)
-        d[project] = SqliteSpiderQueue(dbpath)
+        d[project] = spiderqueue(dbpath)
     return d
 

I notice that the builtin spider queue should be merged with the sqlite priority queue which may also be eventually replaced by https://github.com/scrapy/queuelib Such a plan shouldn't prevent us from introducing the above config option as long as the interface remains the same.

Digenis avatar Jan 04 '17 10:01 Digenis

An update on my last comment about replacing this queue module with scrapy/queuelib. There are more projects to consider. https://github.com/balena/python-pqueue https://github.com/peter-wangxu/persist-queue

Digenis avatar Jun 16 '19 19:06 Digenis

Discussion of replacement to default spider queue moved to #475

Noting that this issue was postponed "in favour of https://github.com/scrapy/scrapyd/issues/187/3rd solution (Unify queues/dbs)" https://github.com/scrapy/scrapyd/pull/201#issuecomment-485489769

However, #187 has gone nowhere since 2016, and if we do decide to make breaking changes, we can just do that in a major version. So, this is no longer postponed.

jpmckinney avatar Mar 08 '23 17:03 jpmckinney