New mmap strategy breaks consuming binary http streams
Hi, I noticed you've added mmap to the code, which is great!
Unfortunately, it breaks consuming http streams, because urllib3 raw streams (urllib3.response.HTTPResponse) also (surprisingly) have a fileno!
So we get an exception:
in get_memoryview(data)
98 # Handle file object opened in 'rb' mode
99 if hasattr(data, "fileno"):
--> 100 mm = mmap.mmap(data.fileno(), 0, access=mmap.ACCESS_READ)
101 return memoryview(mm)
103 # Handle BufferedReader
OSError: [Errno 22] Invalid argument
I've checked the set difference of attributes of a binary file handle and a HTTPResponse, and these attributes are (on my macos machine) exclusively available on files:
['_dealloc_warn',
'_finalizing',
'detach',
'mode',
'name',
'peek',
'raw',
'readinto1',
'write']
So it may be worth either switching the check or alternatively adding an extra condition that prevents mmap on urllib3.response.HTTPResponse objects.
Cheers!
Nice catch. You are welcome to send a pull request. Or I will look after it when I find some time :)
I'm not sure if I have the time, but if so I'll happily create a PR.
The easiest fix is of course to simply reject those streams. (geturl seems like an attribute that is very, very unlikely to be present on file handles, but can be swapped of course depending on how strict the check should be):
if hasattr(data, "fileno") and not hasattr(data, 'geturl'):
But to be honest, it seems like a loss to exclude any and all binary streams. A better option would be to properly support them. That means either creating a separate code path bypassing memoryviews entirely (as was the case about a year ago).
Or, if you think that even streams might benefit from memoryviews, we could transform each chunk read from the stream.