aiohttp
aiohttp copied to clipboard
StreamReader.iter_chunked(n) yields chunk_sizes < n before end of stream
Describe the bug
I was having compatibility issues when moving from requests to aiohttp in my application. Issue seems to be that aiohttp's iter_chunked method may yield small chunks in the middle of the stream, unlike requests.Response.iter_content(n).
Not sure if this behavior is intended but it was not indicated by the documentation or the source code.
To Reproduce
Run this when downloading a large file.
async for chunk in resp.content.iter_chunked(2048*3):
print(len(chunk))
Expected behavior
All chunk sizes are 6144 except the last one.
Logs/tracebacks
(Actual) Sample output of "To Reproduce" code:
6144
6144
6144
6144
6144
6144
6144
6144
6144
6144
6144
6144
6144
6144
6144
6144
6144
6144
6144
6144
6144
6144
6144
110
6144
6144
6144
6144
6144
2048
6144
6144
6144
5904
6144
6144
6144
1928
6144
6144
4096
6144
6144
6144
5904
6144
### Python Version
```console
$ python --version
Python 3.10.0
aiohttp Version
$ python -m pip show aiohttp
Name: aiohttp
Version: 3.9.1
Summary: Async http client/server framework (asyncio)
...
multidict Version
$ python -m pip show multidict
Name: multidict
Version: 6.0.4
yarl Version
$ python -m pip show yarl
Name: yarl
Version: 1.9.3
Summary: Yet another URL library
OS
macOS 12
Related component
Client
Additional context
No response
Code of Conduct
- [X] I agree to follow the aio-libs Code of Conduct
Not sure if this behavior is intended but it was not indicated by the documentation or the source code.
It says 'with maximum size limit', which atleast to me, suggests that chunks may be smaller. But, feel free to make a PR to make it clearer.
I read that as a deliberate design. To have chunks of a fixed size would require not sending data that is already available, which could cause unnecessary delays.
Got it, thanks! So in order to get precise chunks of size N, is it recommended to use readexactly(N)? This was several orders of magnitude slower for my use case. I just ended up writing to a tempfile and reading out exactly sized chunks.
I think it's important to address these since the chunked reading feature is a widely used part of requests and a lot of people expect exact analogues in aiohttp.
I can make a PR updating docs. Let me know if there's any other relevant information/context.
I guess readexactly() works, or just adding chunks together yourself, maybe something like:
for new_chunk in ...:
chunk += new_chunk
if len(chunk) >= n:
process(chunk[:n])
chunk = chunk[n:]
I guess it's possible we could add an option if there's some solid use cases for it, but I'm not clear what those are yet. I think the main use case for the chunk size is to limit the amount of memory the application uses (e.g. downloading a large file and writing it to disk, on a tiny embedded system you may want to limit the amount of memory used for this to 1KB or something, while a desktop application might be good with 10MB as a limit).
My use case was for blowfish decryption, which was done on a chunk by chunk basis. This behavior was an issue because for any chunk with length n < chunk_size
# from iteration i from iteration i+1
decrypt(chunk[:n]) + decrypt(chunk[n:])
is not the same as
# from iteration i
decrypt(chunk)
which is what I want. I don't know any other use cases off the top of my head but I feel as though reading constant size blocks from a stream should be common enough.
This feels to me like an itertools job.
Something like batched(chain.from_iterable(resp.content.iter_chunked(n)), n) should do it.
Looks like there are async versions of itertools at https://asyncstdlib.readthedocs.io/en/stable/source/api/itertools.html and https://aioitertools.omnilib.dev/en/stable/api.html (the former doesn't look like it has batched() yet, and the latter appears to have named it chunked()).