RequestQueue could have a limit on max enqueued requests
Currently, when crawling pages with Pseudo URLs, it often happens that the crawler spends most of its time enqueueing thousands of pages in the request queue and the user has no way of limiting this behavior. They may set the maxRequestsPerCrawl option, but that only limits the pages actually crawled, not the requests enqueued. Thus, the user may end up with 100 pages crawled and thousands in the queue.
This will be especially important when switching to the per-request priced persistent queue.
We could add an enqueuedRequests property to RequestQueue that would get initialized automatically to current value from storage and then increment itself in memory with each added request.
We would also add an options.requestLimit configuration property to RequestQueue. After reaching this limit, .addRequest() would return null or something and prevent enqueueing of more requests.
@mtrunkat @jancurn
I think this is a good idea. Maybe I'd call the option differently, e.g. maxRequestCount to make it more clear.
One note: if the new request has forefront: true, shall we enqueue it or not when limit is reached? To be perfectly logically correct, we should, since it means the request has some kind of a priority.