scrapy-splash
scrapy-splash copied to clipboard
Explain how to use scrapy-splash with AutoThrottle
AutoThrottle extension doesn't play nicely with scrapy-splash because it thinks requests take a very long time, and adjusts request rate accordingly.
Should we simply state that it should be disabled as part of the configuration instructions?
There are ways to make it work with AutoThrottle in a more reasonable way, e.g. https://github.com/TeamHG-Memex/undercrawler/blob/master/undercrawler/middleware/throttle.py.
As a first step - yes, it makes sense to at least document this problem. For example, as I recall, Autothrottle is enabled by default on Scrapy Cloud (is it still on by default?).
What if we add something like https://github.com/TeamHG-Memex/undercrawler/blob/master/undercrawler/middleware/throttle.py to scrapy-splash itself?
In addition to documenting its (optional) usage, we could log a warning if Scrapy’s built-in AutoThrottle is used along with scrapy-splash.