scrapy-deltafetch icon indicating copy to clipboard operation
scrapy-deltafetch copied to clipboard

Scrapy spider middleware to ignore requests to pages containing items seen in previous crawls

Results 18 scrapy-deltafetch issues
Sort by recently updated
recently updated
newest added

I ran this plugin for https://quotes.toscrape.com/ but It redirects every requests and this plugin doesn't work for redirects

* Use `Item` instead of `BaseItem` since it's deprecated in favor of the prior * Bump minimum python version to 3.6 due to the use of `f-strings` * `Black`ed the...

``` File "/home/.virtualenvs/Spider_py2/local/lib/python2.7/site-packages/scrapy_deltafetch/middleware.py", line 79, in process_spider_output if key in self.db: DBRunRecoveryError: (-30974, 'DB_RUNRECOVERY: Fatal error, run database recovery -- PANIC: fatal region error detected; run recovery') ``` it looks...

Hi. I used deltafetch on Linux seamlessly, but on Windows, it cannot be installed due to `bsddb3` incompatibility with windows. Is there any workaround? Maybe using SQLite as backend instead...

If a URL is in start_urls of the spider, it is never skipped by this middleware, i.e. every URL in start_urls is parsed by the parse method of the spider....

Ignoring already visited url is limited by "DOWNLOAD_DELAY",Is there any way to make Filtered URL affected by setting DOWNLOAD_DELAY?

Hi! Is it possible to set deltafetch stop scrapy crawling when encountering a visited link? I really need this!

When I run an initial scrape with delta-fetch enabled it tells me that it has stored 263 urls where items have been scraped. I assume when I run it again...

Kinldy help me on below issue: I tried to crawl the data using “DeltaFetch”, but facing below issue: My DB file is getting updated both time when i am using...