scrapy-deltafetch
scrapy-deltafetch copied to clipboard
Scrapy spider middleware to ignore requests to pages containing items seen in previous crawls
I ran this plugin for https://quotes.toscrape.com/ but It redirects every requests and this plugin doesn't work for redirects
* Use `Item` instead of `BaseItem` since it's deprecated in favor of the prior * Bump minimum python version to 3.6 due to the use of `f-strings` * `Black`ed the...
``` File "/home/.virtualenvs/Spider_py2/local/lib/python2.7/site-packages/scrapy_deltafetch/middleware.py", line 79, in process_spider_output if key in self.db: DBRunRecoveryError: (-30974, 'DB_RUNRECOVERY: Fatal error, run database recovery -- PANIC: fatal region error detected; run recovery') ``` it looks...
Hi. I used deltafetch on Linux seamlessly, but on Windows, it cannot be installed due to `bsddb3` incompatibility with windows. Is there any workaround? Maybe using SQLite as backend instead...
If a URL is in start_urls of the spider, it is never skipped by this middleware, i.e. every URL in start_urls is parsed by the parse method of the spider....
Ignoring already visited url is limited by "DOWNLOAD_DELAY",Is there any way to make Filtered URL affected by setting DOWNLOAD_DELAY?
Hi! Is it possible to set deltafetch stop scrapy crawling when encountering a visited link? I really need this!
When I run an initial scrape with delta-fetch enabled it tells me that it has stored 263 urls where items have been scraped. I assume when I run it again...
Kinldy help me on below issue: I tried to crawl the data using “DeltaFetch”, but facing below issue: My DB file is getting updated both time when i am using...