cachecontrol icon indicating copy to clipboard operation
cachecontrol copied to clipboard

no way to set msgpack max_bin_len limits use of cache to small files

Open cdent opened this issue 6 years ago • 1 comments

When trying to use cachecontrol with very large files (disk images in the case I'm considering), there's no easy way to pass a max_bin_len to msgpack.loads to say "yeah, I really do want to be able to load huge files".

cachecontrol will write the huge files, but then when it comes round to read them, msgpack will produce a ValueError and cachecontrol will return None to the deserialization routines.

It appears that the way to hack around it would be to subclass Serializer and replace loads_v4 to give some args to msgpack.loads.

Is there a better way? Is this something that you'd be interested in seeing as a kwarg passed down from CacheControl?

cdent avatar Jan 21 '19 20:01 cdent

It appears that this got fixed somehow.

bob $ pip3 freeze | egrep -i 'requests|msgpack|cache'
CacheControl==0.12.6
msgpack==1.0.2
requests==2.25.1

alice $ (
printf 'HTTP/1.0 200 OK\n'
printf 'Date: '; LC_ALL=C date -u '+%a, %d %b %Y %X %Z'
printf 'Content-Length: 500000000\n'
printf 'Cache-Control: max-age=6000\n\n'
yes | dd iflag=count_bytes count=500MB
) | nc -l 8000

bob $ python3 -c '                                
import requests
import cachecontrol.caches
s = requests.session()
c = cachecontrol.caches.FileCache("./cache")
a = cachecontrol.CacheControlAdapter(c)
s.mount("http://", a)
print(len(s.get("http://localhost:8000/foo.txt").content))
'
500000000

bob $ python3 -c '
import requests
import cachecontrol.caches
s = requests.session()
c = cachecontrol.caches.FileCache("./cache")
a = cachecontrol.CacheControlAdapter(c)
s.mount("http://", a)
print(len(s.get("http://localhost:8000/foo.txt").content))
'
500000000

The second request is definitely served from cache because nc stops listening after the first client disconnects.

500000000 is enough to exceed the default max_bin_len:

bob $ MSGPACK_PUREPYTHON=1 python -c '
import msgpack, sys
with open(sys.argv[1], "rb") as f:
    f.read(5)
    u = msgpack.Unpacker(f)
    u.unpack()
' ./cache/5/c/a/8/b/5ca8b7d8184924c60c5c454a874bf5ed7b4741d0660cb7d295185d63 
Traceback (most recent call last):
  File "<string>", line 6, in <module>
  File "/path/to/python3.9/site-packages/msgpack/fallback.py", line 723, in unpack
    ret = self._unpack(EX_CONSTRUCT)
  File "/path/to/python3.9/site-packages/msgpack/fallback.py", line 671, in _unpack
    ret[key] = self._unpack(EX_CONSTRUCT)
  File "/path/to/python3.9/site-packages/msgpack/fallback.py", line 671, in _unpack
    ret[key] = self._unpack(EX_CONSTRUCT)
  File "/path/to/python3.9/site-packages/msgpack/fallback.py", line 625, in _unpack
    typ, n, obj = self._read_header(execute)
  File "/path/to/python3.9/site-packages/msgpack/fallback.py", line 467, in _read_header
    raise ValueError("%s exceeds max_bin_len(%s)" % (n, self._max_bin_len))
ValueError: 500000000 exceeds max_bin_len(104857600)

hexagonrecursion avatar Feb 27 '21 06:02 hexagonrecursion

I've tried reproducing a variant of this as part of #336, but failed to. I'm going to close thisn out and track any follow-ups there. Thanks all!

(If anybody has a reproducer for this, it would be greatly appreciated.)

woodruffw avatar Jun 13 '24 14:06 woodruffw