scrapix
scrapix copied to clipboard
Provide option to slow or rate limit requests
I've been testing out scrapix and first off, awesome work! With a little bit of tinkering around I got it working with meilisearch cloud FAST!
That said, it could be useful to add an option to rate limit request. I didn't see anything other than the batch_size
which I believe has more to do with how frequently documents are imported into the search index.
This isn't as big an issue when it comes to indexing internal websites, but as I was testing it out on a rather large public collection of docs (reactnative.dev), it quickly stared denying my requests. Likely because scrapix was firing off LOTS of requests which might look a bit like malicious traffic.
Apache Nutch has a default rate limit of 5000ms (which in my opinion is a bit high). It could be a good idea to implement something like this for scrapix if it doesn't already exist. I could potentially implement it if you guys are welcoming PRs?