starlette Reuse Request's body buffer for call

Thought of this while working on related work to #1691. Sorry if this has already been proposed, I know we've gone around a couple times with similar stuff.

Background:

#848
#1519
#944
#847

Basically if the user calls Request.body() from their dispatch function we cache the entire request body in memory and pass that to downstream middlewares but if they call Request.stream() then all we do is send an empty body so that downstream things don't hang forever.

I think this behavior is pretty sensible and doesn't use any unexpected memory (e.g. caching all of the body if Request.stream() was called). It also doesn't break the ASGI flow: if a downstream middleware replaces receive() or modified messages the downstream ASGI app (including the final endpoint) will see the new receive().

If this approach works well we could upstream this into the base Request so that arbitrary uses of Request can inter operate with ASGI apps/call chains.

Jun 15 '22 03:06 adriangb

It also doesn't break the ASGI flow: if a downstream middleware replaces receive() or modified messages the downstream ASGI app (including the final endpoint) will see the new receive().

TODO: add a test to verify this.

Other misc TODO:

handle client disconnects (the test for this is fugly)

Sep 04 '22 18:09 adriangb

What does this means exactly:

If this approach works well we could upstream this into the base Request so that arbitrary uses of Request can inter operate with ASGI apps/call chains.

You mean that if we create a Request object inside a middleware/ASGI app, we can't get the body again if we use the same middleware twice, for example?

What I mean here is that if we moved this implementation into the base Requests class it would allow folks not using BaseHTTPMiddleware but using Request (https://github.com/encode/starlette/blob/master/docs/middleware.md#reusing-starlette-components) to read bodies without breaking the downstream ASGI app as long as they do receive = request.wrapped_receive.

Sep 09 '22 17:09 adriangb

Note that this does not fix https://github.com/encode/starlette/issues/493#issuecomment-487566994 or any other case where the body is consumed in the endpoint and then Request.body() is called somewhere else (e.g. in BaseHTTPMiddleware after call_next() or in an exception handler). These seem like finicky use cases, I don't think there's any general way to maintain full ASGI compatibility and supporting these use cases since they essentially require information to be transmitted "upstream". We could mitigate the "hang forever" behavior by detecting stream double consumption and erroring out, but that's about it.

Oct 03 '22 14:10 adriangb

I tried to figure out a way to get use cases like https://github.com/encode/starlette/issues/493#issuecomment-487566994 working but failed. Here's a comparison of before and after for this change:

Request.body called first in	Endpoint	Middleware	Error handler
Endpoint	✅	❌	N/A
Middleware	❌	✅	N/A
Error handler	❌	❌	✅

Request.body called first in	Endpoint	Middleware	Error handler
Endpoint	✅	✅	N/A
Middleware	❌	✅	N/A
Error handler	❌	❌	✅

So this only really fixes the case that could already be fixed by moving to pure ASGI middleware, as discussed in https://github.com/encode/starlette/pull/1519#issuecomment-1060633787. The use cases this doesn't fix (reading the request body for the second time in an error handler or a middleware) are also not use cases you can get around using pure ASGI middleware.

So the question is: do we move forward with this to fix one specific use case, or do we leave it as -is?

Oct 04 '22 05:10 adriangb

Thank you all for developing and contributing to this library. Today I want to make a logging for the back-end app, and I'm stuck on it. Is there any change in the status of this PR? I've seen this PR change and I think this implementation is too confusing, we have to look at this issue differently. Sorry if I don't express myself that way, get it right, I have no doubt about your professionalism, but the Middleware codebase requires a refactor as to me

Jan 15 '23 15:01 ZhymabekRoman

Thanks for the feedback @ZhymabekRoman.

I've seen this PR change and I think this implementation is too confusing, we have to look at this issue differently.

Is it confusing to you conceptually or the code itself?

but the Middleware codebase requires a refactor as to me

I don't get what you mean by this. Are you referring to BaseHTTPMiddleware? To all middleware?

Jan 15 '23 18:01 adriangb

We could also detect stream double consumption here and prohibit calling stream() and body() after call_next() is called if the stream is consumed. That way there’s no error/hanging. With that I think we will have pretty much solved all of BaseHTTPMiddleware’s problems?

Feb 10 '23 20:02 adriangb

We could also detect stream double consumption here and prohibit calling stream() and body() after call_next() is called if the stream is consumed. That way there’s no error/hanging. With that I think we will have pretty much solved all of BaseHTTPMiddleware’s problems?

Besides the ContextVar?

Feb 10 '23 23:02 Kludex

Yep

Feb 11 '23 00:02 adriangb

@tomchristie are there any other changes you'd like, or anything I can do to help get this reviewed?

Mar 28 '23 11:03 adriangb

Sorry the time it took to review this.

May 04 '23 08:05 Kludex

@Kludex thank you for the review. No worries with the timeline, the only issue becomes that I start forgetting why I did things like I did so sorry if answers to some of your questions weren't super clear.

I fixed the bug you found and reworked things to try to make the answers to your questions clearer by just looking at the code. I also added more tests, now there are 400+ lines of tests for 70 lines of implementation 😅 .

May 04 '23 17:05 adriangb

Reuse Request's body buffer for call_next in BaseHTTPMiddleware