scrapy-deltafetch
scrapy-deltafetch copied to clipboard
Added: A DeltaFetchPseudoItem for storing requests with no items yielded
Sometimes one may want to store a request key for future skipping even when there's no item generated.
Currently I handle such cases by yielding a pseudo item from such responses, then:
- Drop that item using another middleware to be placed after deltafetch, or
- Drop that item in a pipeline
It would be nice to support this feature inside deltafetch.
Codecov Report
Merging #19 into master will increase coverage by
0.69%
. The diff coverage is100%
.
@@ Coverage Diff @@
## master #19 +/- ##
=======================================
+ Coverage 91.3% 92% +0.69%
=======================================
Files 2 2
Lines 69 75 +6
Branches 9 11 +2
=======================================
+ Hits 63 69 +6
Misses 3 3
Partials 3 3
Impacted Files | Coverage Δ | |
---|---|---|
scrapy_deltafetch/middleware.py | 91.78% <100%> (+0.73%) |
:arrow_up: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update aea3c34...d18b9fb. Read the comment docs.
A shameless plug: https://github.com/TeamHG-Memex/scrapy-crawl-once is a similar package, but storage decision is not based on items - an explicit meta key is used (users can still set it based on scraped items if they want). So instead of creating fake items and dropping them in a middleware one can just set request.meta['crawl_once']=False
. It also shouldn't have issues like https://github.com/scrapy-plugins/scrapy-deltafetch/issues/18 because it uses sqlite. It is harder to use if a decision should be based on whether items are scraped or not though.