scrapy-zyte-api icon indicating copy to clipboard operation
scrapy-zyte-api copied to clipboard

Allow disabling AutoThrottle bypassing

Open Gallaecio opened this issue 1 year ago • 4 comments

The downloader middleware of scrapy-zyte-api was created to prevent AutoThrottle to affect requests driven through Zyte API, and instead let Zyte API itself control throttling on the server side, sending HTTP 429 responses when a spider is hitting a website too hard.

Relying on Zyte API to handle per-website throttling should most often be the best solution, since Zyte API can have a better picture of the traffic that a website can support and having central throttling control allows running multiple spiders against the same domain in parallel without increasing the overall concurrency to the upstream website.

However, some users might want to let AutoThrottle do its thing anyway. We could implement a setting to let them do just that.

Gallaecio avatar Jul 10 '23 08:07 Gallaecio