asyncio icon indicating copy to clipboard operation
asyncio copied to clipboard

Consider adding `limit` parameters to StreamReader.readline and .readuntil.

Open 1st1 opened this issue 10 years ago • 7 comments

This topic was originally brought up in PR #297.

StreamReader class allows to implement protocols parsers in a very convenient way already. Instead of managing buffers and callbacks, one can layout even complex protocols as a series of await read(..), await readline(..) (and soon await readuntil(..)) calls.

I think that adding an ability to adjust the buffer limit per operation is a great idea. We just need to come up with a good API design for it.

One example of where custom read limits would be useful is a MIME parser. It makes sense to set the default StreamReader's limit to a few megabytes, and then use readline(limit=8*1024) for reading headers, and readuntil(separator=boundary) for reading attachments.

@gvanrossum also suggested to add a StreamReader.set_buffer_limit method. Which actually sounds like a better option. For MIME parser, for example, we could initialize StreamReader with its default limit, read headers, and then increase the limit to make it possible to use readuntil for parsing attachments.

1st1 avatar Dec 14 '15 21:12 1st1

But what would be the purpose of lowering the limit for reading the headers in that example?

gvanrossum avatar Dec 14 '15 22:12 gvanrossum

But what would be the purpose of lowering the limit for reading the headers in that example?

It might make sense for an HTTP server, to read the first 16kb (an average request head should fit) and close the connection if it's a broken request; if it's a normal request, you can raise the limit and continue. I'm not sure that we need this, TBH, since in real life Python apps are usually deployed behind an nginx or other proxy server. But I'm interested in discussion.

@asvetlov Do you think it is useful to have this in asyncio? I know, that aiohttp implements its own streaming abstraction, how do you implement the buffering/limits/fc there? CC @haypo

1st1 avatar Dec 14 '15 23:12 1st1

Of course one thing to keep in mind is that the limit only applies to readline() and readuntil(). You can still use read() or readexactly() with arbitrary sizes. For the rest of HTTP you need to implement a protocol stack anyway -- e.g. when using chunked encoding, the only direct readline() calls you can do on the stream are those that read the chunk headers -- and for those the default limit is fine. (This stack is also the reason that aiohttp implements its own streaming -- the app wants to read from the stream abstraction after chunked, gzip etc. are interpreted.)

gvanrossum avatar Dec 14 '15 23:12 gvanrossum

In aiohttp we have limit parameter for .readuntil() and family. @gvanrossum is right -- we need custom streaming for processing compression and chunked encoding. I'm not 100% happy with current generators-based implementation but some concept of filters is required for transferring incoming data stream into decoded presentation.

asvetlov avatar Dec 15 '15 15:12 asvetlov

BTW, I'm glad that we've at least arrived at the same convention as aiohttp for .readuntil(stop): it returns the data including the delimiter ( http://aiohttp.readthedocs.org/en/stable/_modules/aiohttp/parsers.html#ParserBuffer.readuntil ).

I still think it's unnecessary to have a different limit per call if you have one per stream. (aiohttp readuntil() has an infinite default limit BTW.)

On Tue, Dec 15, 2015 at 7:21 AM, Andrew Svetlov [email protected] wrote:

In aiohttp we have limit parameter for .readuntil() and family. @gvanrossum https://github.com/gvanrossum is right -- we need custom streaming for processing compression and chunked encoding. I'm not 100% happy with current generators-based implementation but some concept of filters is required for transferring incoming data stream into decoded presentation.

— Reply to this email directly or view it on GitHub https://github.com/python/asyncio/issues/304#issuecomment-164796356.

--Guido van Rossum (python.org/~guido)

gvanrossum avatar Dec 15 '15 17:12 gvanrossum

Agree, per stream limit is enough.

asvetlov avatar Dec 15 '15 19:12 asvetlov

Any further thoughts on this? Having to replace a nice readline call with some custom accumulator! Thanks

alexweej avatar Oct 25 '17 11:10 alexweej