framework icon indicating copy to clipboard operation
framework copied to clipboard

Implement download delay

Open roll opened this issue 9 years ago • 1 comments

Overview

@pwalsh has wrote

sleep:

it is a killer if you can't force a sleep between runs. This was a crude way to work around API rate limiting, by enforcing a rest between requests. We need some way to support this.

I think the correct treatment would be for the asynchronous workers to have a lookup table to reference, where keys are domains, and values are seconds since last request. Then we could use a crude sleep value to enforce a wait before next request (I know, it gets real complicated when the processing is async), and later, even build in support for reading robots.txt to be good web citizens and respect the request limits that webmasters request from us as consumers.

http://stackoverflow.com/questions/8768439/how-to-give-delay-between-each-requests-in-scrapy

roll avatar Oct 30 '16 10:10 roll

@amercader I suppose we will run into it soon with gt.io. So putting it into Version-1 milestone.

roll avatar Nov 18 '16 11:11 roll