scrapy-deltafetch icon indicating copy to clipboard operation
scrapy-deltafetch copied to clipboard

Added: A DeltaFetchPseudoItem for storing requests with no items yielded

Open starrify opened this issue 7 years ago • 2 comments

Sometimes one may want to store a request key for future skipping even when there's no item generated.

Currently I handle such cases by yielding a pseudo item from such responses, then:

  • Drop that item using another middleware to be placed after deltafetch, or
  • Drop that item in a pipeline

It would be nice to support this feature inside deltafetch.

starrify avatar May 16 '17 18:05 starrify

Codecov Report

Merging #19 into master will increase coverage by 0.69%. The diff coverage is 100%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master   #19      +/-   ##
=======================================
+ Coverage    91.3%   92%   +0.69%     
=======================================
  Files           2     2              
  Lines          69    75       +6     
  Branches        9    11       +2     
=======================================
+ Hits           63    69       +6     
  Misses          3     3              
  Partials        3     3
Impacted Files Coverage Δ
scrapy_deltafetch/middleware.py 91.78% <100%> (+0.73%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update aea3c34...d18b9fb. Read the comment docs.

codecov-io avatar May 16 '17 18:05 codecov-io

A shameless plug: https://github.com/TeamHG-Memex/scrapy-crawl-once is a similar package, but storage decision is not based on items - an explicit meta key is used (users can still set it based on scraped items if they want). So instead of creating fake items and dropping them in a middleware one can just set request.meta['crawl_once']=False. It also shouldn't have issues like https://github.com/scrapy-plugins/scrapy-deltafetch/issues/18 because it uses sqlite. It is harder to use if a decision should be based on whether items are scraped or not though.

kmike avatar May 16 '17 21:05 kmike