scrapy
scrapy copied to clipboard
Response to/from dict, HTTP cache updates
Continuation of #5130. Closes #1450, closes #1877, closes #4923.
This PR introduces a new feature to (de)serialize responses, which is used in the HTTP cache extension.
Codecov Report
Merging #5131 (b4de33d) into master (9a28eb0) will decrease coverage by
0.01%
. The diff coverage is98.82%
.
@@ Coverage Diff @@
## master #5131 +/- ##
==========================================
- Coverage 88.77% 88.76% -0.02%
==========================================
Files 163 163
Lines 10666 10653 -13
Branches 1818 1812 -6
==========================================
- Hits 9469 9456 -13
Misses 922 922
Partials 275 275
Impacted Files | Coverage Δ | |
---|---|---|
scrapy/utils/response.py | 92.06% <95.00%> (+1.86%) |
:arrow_up: |
scrapy/extensions/httpcache.py | 94.66% <100.00%> (-0.78%) |
:arrow_down: |
scrapy/http/response/__init__.py | 97.70% <100.00%> (+0.23%) |
:arrow_up: |
scrapy/utils/httpobj.py | 100.00% <100.00%> (ø) |
|
scrapy/utils/request.py | 100.00% <100.00%> (ø) |
Hey! Unlike request <=> dict conversion, which is used by Scrapy e.g. for the disk queues, here it seems the newly introduced method and function are not used by Scrapy itself. E.g. it is not used for storing of the cache data. I think that's not a great situation, because we're almost introducing an extension point, protocol for response serialization / deserialization here, but not using it in the framework itself.
What's the use case you see for these feature as it is now? Is it just debugging, as in the original ticket?
@kmike I started working on this (on #4945, rather) after #4923. This would provide a way of dealing with cached responses in an uniform fashion, taking the burden off from the cache implementations themselves.
I do wonder if maybe it makes sense to implement this along with the changes to cache handling that make use of this.
This is technically backward-incompatible, but I don't think that's too much of a problem in practice (I believe invalidating an existing cache after a Scrapy version change is probably reasonable).
I’m unsure (+0) on whether or not it’s fine.
I do wonder if merging this change would affect https://github.com/scrapy/scrapy/pull/5175 in any way (as in https://github.com/scrapy/scrapy/pull/5175 could be implemented differently if we are going to break caches anyway). I’m not sure either, though.
@elacuesta What would you think of extracting a subset of these changes into a separate pull request, specifically Response.attributes
and its use in replace
, to address https://github.com/scrapy/scrapy/issues/1877? I suspect we’ll be able to merge that sooner.