scrapy-webdriver
scrapy-webdriver copied to clipboard
Fixed unfiltered duplicates bug, removed dont_filter
Middleware was emitting requests with dont_filter=True, causing multiple uncaught duplicates.
dont_filter is not needed by itself, but it was protecting request queue from exhaustion -- middleware emits one request at a time, so there is always only one request in Scrapy queue. If this request is duplicate and it is dropped by dupefilter, Scrapy request queue becomes empty and spider is closed, even if there are many requests in middleware's queue.
The solution is to catch spider_idle signal and supply next request from the queue.