scrapy-deltafetch URLs in start_urls are not affected by this middleware

URLs in start_urls are not affected by this middleware

Open gfrmin opened this issue 7 years ago • 4 comments

If a URL is in start_urls of the spider, it is never skipped by this middleware, i.e. every URL in start_urls is parsed by the parse method of the spider.

If this can't be fixed, it should be documented.

Mar 15 '17 11:03 gfrmin

I had an issue with this also. Even though all of the start URL were Request objects containing the meta 'deltafetch_key' needed, the were crawled anyway

Nov 05 '17 09:11 eliorc

Yes, I also encountered this problem. Is there any good solution?

Apr 19 '19 01:04 feng1632009

This is now documented under NOTE 1: https://support.scrapinghub.com/support/solutions/articles/22000221912-incremental-crawls-with-scrapy-and-deltafetch-in-scrapy-cloud

Apr 19 '19 12:04 gfrmin

For the time being, my solution is to set a link that must be accessible at the start of start_url, and then the URL of the yield Request crawler url can be affected by the plugin.

Apr 24 '19 08:04 feng1632009

scrapy-deltafetch scrapy-deltafetch copied to clipboard

URLs in start_urls are not affected by this middleware

scrapy-deltafetch
scrapy-deltafetch copied to clipboard