[Request] Add option for sleep interval between page crawls to avoid captchas/rate limits
Hello! I'm trying to crawl a huge website which starts asking for captchas after crawling a few hundreds of pages in a short amount of time. Since setting workers=1 is not enough to avoid hitting the captcha "rate limit", I'm here to ask for the addition of an option to specify a custom sleep interval (e.g. 5 seconds) which makes the crawler do nothing for the specified amount of time before crawling the next page. Youtube-dl has a similar option too, and in my experience it has been useful in other similar circumstances. Thanks!
yeah, that makes sense and is easy to add. Are you thinking it would sleep after every page, or after every N pages?
I think that sleeping after every page should be good enough. Having N workers that sleep after every page provides a similar behavior to sleeping after N pages.