scrapy-deltafetch
scrapy-deltafetch copied to clipboard
URLs in start_urls are not affected by this middleware
If a URL is in start_urls of the spider, it is never skipped by this middleware, i.e. every URL in start_urls is parsed by the parse method of the spider.
If this can't be fixed, it should be documented.
I had an issue with this also. Even though all of the start URL were Request objects containing the meta 'deltafetch_key' needed, the were crawled anyway
Yes, I also encountered this problem. Is there any good solution?
This is now documented under NOTE 1: https://support.scrapinghub.com/support/solutions/articles/22000221912-incremental-crawls-with-scrapy-and-deltafetch-in-scrapy-cloud
For the time being, my solution is to set a link that must be accessible at the start of start_url, and then the URL of the yield Request crawler url can be affected by the plugin.