Response to/from dict, HTTP cache updates

Open elacuesta opened this issue 3 years ago • 7 comments

Continuation of #5130. Closes #1450, closes #1877, closes #4923.

This PR introduces a new feature to (de)serialize responses, which is used in the HTTP cache extension.

May 04 '21 17:05 elacuesta

Codecov Report

Merging #5131 (b4de33d) into master (9a28eb0) will decrease coverage by 0.01%. The diff coverage is 98.82%.

@@            Coverage Diff             @@
##           master    #5131      +/-   ##
==========================================
- Coverage   88.77%   88.76%   -0.02%     
==========================================
  Files         163      163              
  Lines       10666    10653      -13     
  Branches     1818     1812       -6     
==========================================
- Hits         9469     9456      -13     
  Misses        922      922              
  Partials      275      275

Impacted Files	Coverage Δ
scrapy/utils/response.py	`92.06% <95.00%> (+1.86%)`	:arrow_up:
scrapy/extensions/httpcache.py	`94.66% <100.00%> (-0.78%)`	:arrow_down:
scrapy/http/response/__init__.py	`97.70% <100.00%> (+0.23%)`	:arrow_up:
scrapy/utils/httpobj.py	`100.00% <100.00%> (ø)`
scrapy/utils/request.py	`100.00% <100.00%> (ø)`

May 04 '21 17:05 codecov[bot]

Hey! Unlike request <=> dict conversion, which is used by Scrapy e.g. for the disk queues, here it seems the newly introduced method and function are not used by Scrapy itself. E.g. it is not used for storing of the cache data. I think that's not a great situation, because we're almost introducing an extension point, protocol for response serialization / deserialization here, but not using it in the framework itself.

What's the use case you see for these feature as it is now? Is it just debugging, as in the original ticket?

May 10 '21 19:05 kmike

@kmike I started working on this (on #4945, rather) after #4923. This would provide a way of dealing with cached responses in an uniform fashion, taking the burden off from the cache implementations themselves.

May 11 '21 12:05 elacuesta

I do wonder if maybe it makes sense to implement this along with the changes to cache handling that make use of this.

May 11 '21 12:05 Gallaecio

This is technically backward-incompatible, but I don't think that's too much of a problem in practice (I believe invalidating an existing cache after a Scrapy version change is probably reasonable).

Jun 11 '21 17:06 elacuesta

I’m unsure (+0) on whether or not it’s fine.

I do wonder if merging this change would affect https://github.com/scrapy/scrapy/pull/5175 in any way (as in https://github.com/scrapy/scrapy/pull/5175 could be implemented differently if we are going to break caches anyway). I’m not sure either, though.

Jun 13 '21 17:06 Gallaecio

@elacuesta What would you think of extracting a subset of these changes into a separate pull request, specifically Response.attributes and its use in replace, to address https://github.com/scrapy/scrapy/issues/1877? I suspect we’ll be able to merge that sooner.

Jul 14 '21 08:07 Gallaecio

scrapy scrapy copied to clipboard

Response to/from dict, HTTP cache updates

Codecov Report

scrapy
scrapy copied to clipboard