starlette Cache request 'body' and 'stream

Cache the request body and stream_consumed in the ASGI scope, rather than on the request instance.

This preserves the existing behaviours when calling .body() or .stream() multiple times, but extends it to also persisting when new Request instances are created. (Eg. both within some middleware, as well as within an endpoint.)

Similar to #944 but should also handle raising RuntimeError("Stream consumed") if user code attempts to stream the request body more than once.

Edit by @Kludex:

Closes #495
Closes #847
Closes #493

Feb 15 '22 16:02 lovelydinosaur

This looks nice. I feel like the implementation is easier to grok than #944.

And it's all implementation details so it we can always change it in the future 👍 (for example, I think it may make sense to cache other request state into the asgi scope)

Testing wise, I think we need:

A test like this one to to verify the bug/issue is fixed
A couple tests to verify the current behavior w.r.t. consuming the request body and errors.
Maybe tests to verify interaction with middleware. I know this is an edge case, but I think that with this implementation if a middleware consumes the body and then another middleware replaces receive the behavior would be different than what we currently have. Might be worth documenting the change with a test.

Feb 15 '22 16:02 adriangb

This is a test I think is worth at least thinking about:

def test_request_middleware_replaces_receive(test_client_factory):
    """Test the behavior of middleware consuming the request body"""

     async def app(scope: Scope, receive: Receive, send: Send) -> None:
        # a middleware consumes the body
        await Request(scope, receive, send).body()
        # another middleware replaces receive

         async def new_receive() -> Message:
            return {"type": "http.request", "body": b"data"}

         # the endpoint creates a new Request that uses wrapped_receive
        data = await Request(scope, new_receive, send).body()
        await Response(data)(scope, receive, send)

     client = test_client_factory(app)
    resp = client.get("/", content=b"original data")
    assert resp.status_code == 200, resp.content
    assert resp.content == b"data"

It passes on master but not on this branch.

Feb 15 '22 19:02 adriangb

Any movement with this PR as the #944 was closed?

Feb 21 '22 15:02 retrry

So...

I'm not necessarily convinced that this is a road that we should go down.

Starlette is built around "everything is just ASGI". The relevant part here is that middleware is just ASGI middleware. There's pros and cons to that. The pros are that it helps to promote reusable components, the cons are that it's sometimes a more fiddly abstraction to understand.

My feeling here is that rather than attempting to kludge our middleware into a request/response style we ought to instead just provide better documentation about the limitations.

In particular:

Good documentation around how to write plain ASGI middleware.
Documentation around how to write ASGI middleware that inspects the messages being passed through the receive channel.
As identified in https://github.com/encode/starlette/issues/1029, nudge folks away from BaseHTTPMiddleware. It was a neat idea, but there's actually a bunch of different issues with the approach. (#472, #847, #1438)
Document the constraint the requests using BaseHTTPMiddleware or in the exception handlers cannot read the request body. And make sure that it's raising a clear and helpful exception if this is attempted.

I get that this is a cookie that a bunch of folks want. But it has some pretty unhealthy ingredients in it, and I'm not sure I'm comfortable selling it.

Feb 22 '22 11:02 lovelydinosaur

Okay, maybe you have some examples where I could see middleware that inspects or uses request body?

Mar 07 '22 07:03 retrry

@retrry Let me pitch this the other way around... if somebody provides one of the concrete use-cases that they're running into, we could then use that to discuss if/how you'd be able to meet that use-case.

I think that tends to be a more productive tactic, since it helps ground the conversion in an actual end-user-facing issue.

Mar 07 '22 10:03 lovelydinosaur

So my concrete use-case would be that I need to log all POST requests bodies. POST request bodies contain queries, which I need to log for debugging and accounting purposes.

Mar 07 '22 10:03 retrry

Okay.

ASGI middleware is a little complex, so I need to start from basics in figuring out an example here.

First up I went and reminded myself how Starlette does it's class-based ASGI middleware, which I did by taking a look at this particular case. Because I knew it was a fairly simple one.

Next let's start with an empty example to work from.

class EmptyMiddleware:
    def __init__(
        self,
        app: ASGIApp,
    ) -> None:
        self.app = app

    async def __call__(self, scope: Scope, receive: Receive, send: Send) -> None:
        if scope["type"]  != "http":
            await self.app(scope, receive, send)
            return

        # Do stuff with the HTTP case...
        await self.app(scope, receive, send)

Now. There's two different options.

First up, we could ingest the entire request body and log it at the point the middleware runs...

class BodyLoggingMiddleware:
    def __init__(
        self,
        app: ASGIApp,
    ) -> None:
        self.app = app

    async def __call__(self, scope: Scope, receive: Receive, send: Send) -> None:
        if scope["type"]  != "http":
            await self.app(scope, receive, send)
            return

        # Ingest all body messages from the ASGI `receive` callable.
        messages = []
        more_body = True
        while more_body:
            message = await receive()
            messages.append(message)
            more_body = message.get("more_body", False)

        # Log the request
        body = b''.join([message.get("body", b"") for message in messages])
        print(scope["method"], body)

        # Dispatch to the ASGI callable
        async def wrapped_receive():
            # First up we want to return any messages we've stashed.
            if messages:
                return messages.pop(0)

            # Once that's done we can just await any other messages.
            return await receive()

        await self.app(scope, wrapped_receive, send)

Note that I'm also having to refer to https://asgi.readthedocs.io/en/latest/specs/www.html to remind myself how the ASGI HTTP events are structured.

Alternatively we could structure the middleware so that the body is not read entirely into memory, but instead log it only if/when it is accessed by the application.

class BodyLoggingMiddleware:
    def __init__(
        self,
        app: ASGIApp,
    ) -> None:
        self.app = app

    async def __call__(self, scope: Scope, receive: Receive, send: Send) -> None:
        if scope["type"]  != "http":
            await self.app(scope, receive, send)
            return

        async def wrapped_receive():
            message = await receive()
            body = message.get("body", b"")
            more_body = message.get("more_body", False)
            if body:
                # Log each individual chunk as it is ingested.
                print(scope["method"], body, more_body)
            return message

        await self.app(scope, wrapped_receive, send)

So.

(And yup, these are a bit fiddly for developers to figure out.)

Disclaimer: I haven't tested these, let me know if you use either of these patterns. Feedback and alternate implementations are most welcome.

Mar 07 '22 12:03 lovelydinosaur

That generally looks good to me. I went ahead and pushed the second implementation a bit further, intending to handle cases where the app errors or the client disconnects: https://gist.github.com/adriangb/2f003410b05783924d3d6bf3ff18ad6f

Obviously it is a business-level decision if you want to log all of the body even if the client errored, but at lest this shows how you might do that if you wanted to.

Mar 07 '22 18:03 adriangb

Hey, thanks for the code - it works and we have been using it for our logging implementation.

I have one additional question: is it possible to access response in the same middleware? Our use case in other project is: we were logging body (with patched starlette with this MR: https://github.com/encode/starlette/pull/944) and logging response headers in the same log message. Can I do the same in async middleware or it is not possible and I should split log messages to separate messages?

Edit: I think the example you wrote should be added to starlette docs or repo 🙂

Mar 26 '22 07:03 retrry

I strongly feel we need to entertain a merge on this or #944 if for no other reason that:

This is an inevitable stumbling block nearly every Starlette user will run into.
The current resolution path is to search issues on GitHub. :eyes:

Even if the perfect documentation exists on the official site, the hangup will still be perceived as a "bug" every newbie will run into and we'll need to refer them to the manual for.

Reminder: The reason we do not have a flood of issues from FastAPI users is because they patched this with something similar to #944 in 2019 (see https://github.com/tiangolo/fastapi/pull/589 and tiangolo/fastapi#394 for their big thread).

Early adopters of Starlette such as myself already know to work around this by storing the request body, but I could see newcomers seriously re-assessing whether they want to write pure Starlette or choose something else if they hit a snagging issue like this early on.

A "will not fix" resolution is risky to the future growth of the framework.

@tomchristie Suggestion: Perhaps shipping a tiny, performant, built-in middleware for this instead of modifying the core (make it common like the SessionMiddleware). I'd vastly prefer something like this in core but this may satisfy all parties.

Mar 27 '22 02:03 gnat

@tomchristie I've added this PR to next milestone release. Please block me if you don't wish to have this in, and if it's the case, perhaps we should take a decision to not have this in until 1.0. We can go back to this after we get the stable release.

If we are not blocked, I'll merge this before the next release (which will be 0.21.0).

Sep 03 '22 10:09 Kludex

I've become convinced that this is not the right approach. Take a look at this example:

from typing import Awaitable, Callable
from starlette.applications import Starlette
from starlette.middleware.base import BaseHTTPMiddleware
from starlette.requests import Request
from starlette.responses import Response
from starlette.middleware.base import BaseHTTPMiddleware
from starlette.routing import Route
from starlette.types import ASGIApp, Message, Receive, Scope, Send


CallNext = Callable[[Request], Awaitable[Response]]


class LogRequestBodySize(BaseHTTPMiddleware):
    async def dispatch(self, request: Request, call_next: CallNext) -> Response:
        print(len(await request.body()))
        return await call_next(request)


def replace_body_middleware(app: ASGIApp) -> ASGIApp:
    async def wrapped_app(scope: Scope, receive: Receive, send: Send) -> None:
        async def wrapped_rcv() -> Message:
            msg = await receive()
            msg["body"] += b"foo-"
            return msg

        await app(scope, wrapped_rcv, send)

    return wrapped_app


async def endpoint(request: Request) -> Response:
    assert (await request.body()).startswith(b"foo-")
    return Response()


app: ASGIApp = Starlette(routes=[Route("/", endpoint)])
app = replace_body_middleware(app)
app = LogRequestBodySize(app)

When you hit the / endpoint you'll get a 500 response because the body read in the endpoint is not the body

I know that replace_body_middleware seems contrived, but (1) it could be something like uncompressing an incoming request and (2) it's a perfectly valid ASGI middleware.

Sep 03 '22 17:09 adriangb

What's the right approach?

Sep 03 '22 17:09 Kludex

I think I'd prefer something like #1692. This doesn't solve the issue for things that use Request outside of BaseHTTPMiddleware, but I think it solves the issue for things that use it inside of BaseHTTPMiddleware which is most of the cases where users encounter this.

I'm also open to saying this is a wontfix and move towards something like #1691

Sep 03 '22 17:09 adriangb

but I think it solves the issue for things that use it inside of BaseHTTPMiddleware which is most of the cases where users encounter this.

The necessity has been discussed at length previously- it's certainly not just middleware. Using POST data anywhere in a request/response is extremely common for websites. See Toms starting post.

The ideal DX is being able to call post = await request.form() or body = await request.body() any time and not worrying about consuming it--- or storing it in the request state, or storing it yourself in an intermediate variable somewhere.

Just to name a few off the top of my head: Sanic, FastAPI, vanilla PHP, etc. all do this. Not having this is just extra manual busywork people should not have to get hung up on.

Sep 03 '22 18:09 gnat

Where else would you want to call Request.body()? Like you say it already works in FastAPI endpoints and dependencies (because the same Request instance is used everywhere).

Sep 03 '22 21:09 adriangb

Closing this in favor of #1692

@adriangb can you confirm that the issues this PR closes, are also closed on #1692? 🙏

Oct 03 '22 13:10 Kludex

#1692 doesn't solve the issues this PR solves.

Jan 27 '23 09:01 Kludex

Mind being more specific about which issue? Also JFYK this change is going to wreak all sorts of havoc

Jan 27 '23 15:01 adriangb

@tomchristie @Kludex It would be great to get this improvement in Starlette to be able to read safely the request body within a middleware. I was wondering if this change is planned to be applied also on previous Starlette versions (like 0.19.1)? I will be happy to discuss also some other potential issues that have been solved (or not) on Starlette 0.19.1

Mar 28 '23 11:03 enono78

I was wondering if this change is planned to be applied also on previous Starlette versions (like 0.19.1)?

No.

Mar 28 '23 11:03 Kludex

Given this comment, we can't accept this PR.

@adriangb worked on replacements for this:

https://github.com/encode/starlette/pull/2026
https://github.com/encode/starlette/pull/1692

Let's continue on them.

Jun 01 '23 18:06 Kludex

Cache request 'body' and 'stream_consumed' in the ASGI scope