scrapy-crawl-once icon indicating copy to clipboard operation
scrapy-crawl-once copied to clipboard

Scrapy middleware which allows to crawl only new content

Results 6 scrapy-crawl-once issues
Sort by recently updated
recently updated
newest added

scrapy-crawl-once has no built-in way of clearing out all seen requests via settings.

Aimed to fix #4 Added setting similar to DELTAFETCH_RESET Expected usage: in settings.py: `CRAWL_ONCE_RESET = True` or in terminal: `scrapy crawl spider_name -a crawl_once_reset=True` If True, `SqliteDict.clear()` is called on...

TODO: * [x] DB object * [ ] allow to inject DB to callbacks * [ ] tests * [x] docs * [ ] check if old Pythons need to...

I have a spider crawl only detail pages and they are never skipped by this middleware.

After upgraded scrapy, The follow warning occurs on every request that uses crawl_once: ``` 2022-10-28 15:54:21 [py.warnings] WARNING: /scrapyd/venv/lib/python3.9/site-packages/scrapy_crawl_once/middlewares.py:96: ScrapyDeprecationWarning: Call to deprecated function scrapy.utils.request.request_fingerprint(). If you are using this...