asgiref Exceptions from `send`, and expected behaviors on closed connections.

Exceptions from `send`, and expected behaviors on closed connections.

Open tomchristie opened this issue 5 years ago • 4 comments

Before digging into the meat of this topic I want to start with this assumption:

Exceptions from send and receive should be treated as opaque, and should be allowed to bubble back up to the server in order to be logged. They may be intercepted by logging or 500 technical response middlewares, and subsequently re-raised, but applications shouldn't have any expectations about the type that may be raised, since these will differ from implementation to implementation.

That's pretty much how the spec is currently worded, and I think that's all as it should be. An alternative design could choose to have exceptions be used in a limited capacity to communicate information back to the application, but I don't think that's likely to be desirable. An exception raised by the server in send or receive is a black box indicating an error condition with an associated traceback, and we shouldn't expect to catch or handle particular subclasses in differing ways.

The core question in this issue is: 1. What expectations (if any) should the application have about the behavior when it makes a send() to a closed connection? 2. What behavior is most desirable from servers in that case?

There are three possible cases for the application here:

The application should expect an exception to be raised, indicating that the connection is closed.
The application should expect that no exception be raised.
An exception may or may not be raised, but the application cannot rely on either behavior.

In the case of HTTP, we've adopted "Note messages received by a server after the connection has been closed are not considered errors. In this case the send awaitable callable should act as a no-op." See https://github.com/django/asgiref/pull/49

I don't recall where the conversation that lead to #49 is, but I think it's a reasonable behaviour because premature HTTP client disconnects once receiving the start of the response are valid behaviour, and not an error condition. If we raise an exception in that case then we end up with lots of erroneous exceptions being raised in response to perfectly valid server and client behavior.

If we're happy with that decision then there's two questions that it leads on to:

What do we expect to happen in long-polling / SSE connections if send doesn't raise an exception once the connection has been closed?
What should we expect in the case of websockets?

For the case of (1) I think that #49 means we have to have the expectation that the disconnect is communicated through the receive disconnect message. No exception should be raised from the server on send to the closed connection because that'd pollute server logs with false errors in the standard case, and we can't differentiate between long-polling connections and regular HTTP requests/responses.

Mature server implementations probably would want to enforce a timeout limit on the task once the connection has been closed, and that is the guardrail against applications that fail to properly listen for and act on the disconnect.

For the case of (2) it's less obvious. We could perfectly well raise exceptions in response to send on a disconnected connection. (Which I assume is what all current implementations do.) The issue here is that we'd be raising logged error conditions on what is actually perfectly correct behavior... there's no guarantee that an application will have yet "seen" the disconnect message at the point it sends any given outgoing message.

The upshot of all this is that I think the following would be the right sort of recommendations to make:

Applications must not assume that an exception will necessarily be raised when sending to a closed connection, or rely on the server doing so. They should listen to a disconnect message and terminate the task when that occurs.
Mature server implementations should prefer not to raise exceptions when a send occurs against a closed websocket, but should instead ignore the message, ensure that disconnect is sent on the "receive" channel, and should forcibly kill the task if it has not completed after a reasonable timeout. Server implementations that do raise exceptions in response to send on a closed channel are valid, but may end up falsely logging server errors in response to standard client disconnects.

Anyways, thorny issue, and more than happy to talk this through more, but I'm moderately sure that's the conclusion I'd come to.

Sep 26 '18 10:09 tomchristie

🦋 Changeset detected

Latest commit: 705951c9ea3bed442b7f9930dd7ce61ec183cd71

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package

Name	Type
@astrojs/compiler	Minor

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

Sep 26 '18 10:09 tomchristie

I totally agree that it's just going to happen, due to the async nature of the code in question, that you're going to send just after the socket is disconnected in any case - even if you have a perfect disconnect handling loop. I tell client code writers to write code that expects disconnects at any time, we should be thinking the same way.

My questioning of the problem before was around safety - is it better for us to suppress those errors and maybe let bugs go uncaught? Initially I was erring on the side of not suppressing errors, which tends to be my default, but now I think I would agree that if we didn't suppress them, people would just see these errors randomly and they'd become ignored noise, which is perhaps even more dangerous.

Given that, I'm more than happy to enshrine in the spec that "send should not raise errors if used after disconnect". The only exception I might carve out is if the server sent a disconnect packet and then sent something else, but that's more a nice thing that mature servers could maybe raise a warning over rather than an edge case I want to specify out.

Sep 28 '18 07:09 andrewgodwin

The only exception I might carve out is if the server sent a disconnect packet and then sent something else

Oh absolutely yes. Sending invalid state transitions should raise errors.

if we didn't suppress them, people would just see these errors randomly and they'd become ignored noise

Good point, yes.

I don't think we necessarily need to strongarm "servers must not do this", more that "server implementations should prefer not to do this, and instead prefer to guard against poor app behavior with an enforced timeout after disconnect" (A server implementation could validly raise an error if tasks fail to terminate within the timeframe after a disconnect.)

At some point I'll aim to have a little dig into daphne, hypercorn, and uvicorn's current behavior here, since I doubt it's in line with this proposal. (Paging @pgjones)

Sep 28 '18 08:09 tomchristie

OK, then I think we're agreed. I think this belongs in the www spec rather than the core one - if we're agreed, I'll get it added in?

Sep 28 '18 16:09 andrewgodwin

I've reraised the same concern 5 years later on #405... 🤣

The proposed agreement here doesn't satisfy the case the user doesn't even call receive:

import asyncio

from websockets import exceptions as ex
from wsproto.utilities import LocalProtocolError


async def app(scope, receive, send):
    await send({"type": "websocket.accept"})
    try:
        while True:
            await send({"type": "websocket.send", "text": "potato"})
            await asyncio.sleep(1)
    except ex.WebSocketException:
        print("Server exception leaked to application.")
    except LocalProtocolError:
        print("Server exception leaked to application.")

I've proposed an alternative approach on https://github.com/django/asgiref/issues/405#issuecomment-1704014072, but I'm happy with any solution that doesn't leak the server exception to the application - I don't want the ASGI app to try to catch exceptions that are related to the server.

Sep 03 '23 06:09 Kludex

My position remains the same as it was on #405 - in cases like this, there should be a single kind of exception that the server may raise if the send() call cannot be done for some reason, and we can collapse any internal server errors down to those.

The key thing I want to preserve is that there is enough debugging information in that error (e.g. in its first message argument) so that a user can work out what's going on.

Sep 03 '23 17:09 andrewgodwin

You didn't comment about what I proposed (which seems aligned with what you said, right?).

What are your thoughts about it?

Sep 03 '23 17:09 Kludex

Oh, I missed that one in the flurry of comments. I don't think an extension for this makes sense; there's only so far we can push things like this.

Sep 03 '23 21:09 andrewgodwin

Well, it doesn't need to be an extension. I just think it makes sense for the web application to be ready to receive the exception, and then raise something more meaningful.

Sep 03 '23 22:09 Kludex

What if it's a new field in the scope?

Sep 03 '23 22:09 Kludex

I still don't see a particular advantage over just always saying it must subclass IOError - it's a builtin exception class that fits (you're doing IO) and is unlikely to be used by web applications at that point for anything else.

Sep 04 '23 05:09 andrewgodwin

Ok. It solves my needs.

I can change the documentation to reflect this decision. Can I?

Is this considered a breaking change or a bug fix in terms of the specification?

Sep 04 '23 05:09 Kludex

Of course, please submit a PR and I'll review it.

It's probably worthy of a minor version update, since it's not something that clients/servers can entirely ignore, but we're also not introducing something that will break things in new ways.

Sep 04 '23 16:09 andrewgodwin

Thanks. I'm a bit full in the next weeks, but I don't think there's much hurry here.

I'll get back to it in November. :)

Also, I never said this, but I really appreciate our interactions here @andrewgodwin 😁🙏

Sep 04 '23 17:09 Kludex

Yes, and I appreciate you helping me work through this!

Sep 04 '23 17:09 andrewgodwin

Let me try to summarize the discussion and see if I understood it correctly.

We are going to make the server raise an error if send() is called after the client's disconnected. The error raised should be a subclass of IOError but we don't specify any behavior other than that. In particular, we're not specifying a concrete subclass of IOError because we don't want to make every server depend on asgiref.

A couple of thoughts:

If backwards compatibility is a concern we could make the server only raise an error if some key is included in send(), e.g. send({..., "raise_exc_on_disconnect": True}). We could even make it the default in some future version of the spec so we don't have the key hanging around forever.
We could specify the behavior we want from the raised exception via a typing.Protocol, e.g. a cause() -> Literal[...] method. That way the server can transmit information to the client without necessitating a concrete subclass to be the API contract. That is, asgiref would provide a typing.Protocol and the server would implement it in it's IOError subclass.

Nov 08 '23 16:11 adriangb

I don't think I'm that concerned about backwards compat in this case - the behaviour here is already undefined depending on the server, and I think making that into an explicit error is an improvement.

As for the Protocol - sure, but I'm not sure that's a blocker either way. asgiref has always been a pain to type as it's so dynamic.

Nov 09 '23 20:11 andrewgodwin

@andrewgodwin We need to come back to this discussion.

I don't think the IOError applies for HTTP, only for WebSockets. Since HTTP is already covered by http.disconnect. Do you disagree?

Nov 18 '23 14:11 Kludex

An application could, in theory, still try to send() after a client disconnection has happened, so I think it can happen with HTTP as well?

Nov 18 '23 17:11 andrewgodwin

Yes, you are correct, but that case is already handled by the spec, and it was agreed to be a no-op. It's mentioned in the description of this issue.

This was the PR that included those words: https://github.com/django/asgiref/pull/49

@pgjones If you have time, could you give me your opinion here?

Nov 18 '23 17:11 Kludex

@andrewgodwin Are you disagreeing with your previous message: https://github.com/django/asgiref/issues/66#issuecomment-425342108 ?

I'm trying to understand this specific part:

Given that, I'm more than happy to enshrine in the spec that "send should not raise errors if used after disconnect". The only exception I might carve out is if the server sent a disconnect packet and then sent something else, but that's more a nice thing that mature servers could maybe raise a warning over rather than an edge case I want to specify out.

Nov 19 '23 18:11 Kludex

To be fair, I wrote that previous message five years ago; in the time since, I have now come to believe that explicit errors in this case could be good, but if we make servers start raising them where they previously did not, that might be considered backwards incompatible, which is why I only want to raise them if the behaviour was undefined enough that not a lot of code was relying on this.

That's a lot easier with, say, websocket code due to the much smaller surface area of who is writing it. Web handling code is trickier, since there's increasing numbers of ASGI-compatible web frameworks (which is great!). From the parts of those that I've seen, raising an error if you try to send after a HTTP disconnection might just be annoying (much like the "client already closed connection" errors you get from intermediary servers with this problem already), which is why I am floating the point that it totally can happen that you can send() after a HTTP disconnect, and that I'm not sure about how I feel that HTTP and websocket would behave differently in this case.

Nov 20 '23 18:11 andrewgodwin

I think that error would be annoying if it gets to the end user, but it doesn't necessarily need to. If it's part of the spec the ASGI application can catch the error and simply stop streaming the data / doing the work without raising an error.

I do see the point that using the error as a mode of communication between the asgi server and application when the event is part of normal operation and doesn't require a user's intervention or attention is not a good use of exceptions

I think the right thing to do would be for send() to return a SendResult or something that the application could voluntarily check to see if send() has become a no-op but that would require everything in the chain of server -> application to comply with this new API and hence will be harder to get working than an exception. But maybe a backward-compatible albeit longer-term solution is to implement that API and say that servers and middleware SHOULD return and forward a SendResult (effectively immediately making every implementation be slightly out of spec) and let applications that expect a SendResult handle the case where None is returned. Then next time we ratchet up the major version that SHOULD become a MUST and hopefully by that point it won't be a problem.

Nov 20 '23 18:11 adriangb

It would never get to the end user as, by its nature, they are disconnected at that point - it would just clutter up server error tracking (which it would then stop doing once people wrote apps to handle it, but you get my point).

SendResult is an interesting idea - it certainly preserves the async idea of returning futures - but we could also just return a Future itself if we wanted to go that route I think?

Also, at the end of the day I want pragmatism rather than perfection, and I think returning an error on websocket send after closure is much more doable in a back-compat way (because it's something you should be expecting to handle anyway) - doing it on HTTP would be nice, but maybe something for a future spec version.

Nov 22 '23 17:11 andrewgodwin

it would just clutter up server error tracking (which it would then stop doing once people wrote apps to handle it, but you get my point)

right by "end users" I was referring to users of asgi frameworks. and as you say once applications handle it there won't be cluttering of server error logs. I don't think that would be a problem though: getting errors would require you to update your asgi server without updating your asgi web framework. People tend to need to update their web framework more often than their server (to get new features).

SendResult is an interesting idea - it certainly preserves the async idea of returning futures - but we could also just return a Future itself if we wanted to go that route I think?

I'm not sure I'm understanding. I envisioned SendResult as something like {'disconnected': False} a TypedDict with some information (and maybe metrics?) about the sending of the data.

Nov 22 '23 17:11 adriangb

Well I am proposing that we instead say you can return a future that says if it's sent or not - servers can immediately set_result with true/false if they want to, or we then have the option where they can do it a little later if there's queues to flush.

Nov 26 '23 18:11 andrewgodwin

So you're proposing send(Message) -> SomeAwaitable?

Nov 26 '23 18:11 adriangb

Also, at the end of the day I want pragmatism rather than perfection, and I think returning an error on websocket send after closure is much more doable in a back-compat way (because it's something you should be expecting to handle anyway) - doing it on HTTP would be nice, but maybe something for a future spec version.

I'll take this part, and create a PR.

Nov 30 '23 14:11 Kludex

@andrewgodwin we can make the HTTP version non-breaking if the server raises and catches the exception. That is, given any of these scenarios:

Server updated, framework not -> server raises exception, bubbles up through framework, server catches it and does what it currently does (Uvicorn emits a trace level log). In fact this is better than the current status quo because you only get 1 trace log while currently, assuming the framework is not also listening on rcv() for a disconnect, you get as many as the framework calls send()
Server not updated, framework is. Nothing happens, the exception is never raised.
Server updated, framework updated. The framework can catch the exception and stop sending data or take any other action it wants to. This will be the end state.

Dec 10 '23 07:12 adriangb

Ah yes, that is an interesting take on it - of course, it does bring us back to the problem of "which exception", since we can't have servers catch and squash anything except a unique exception, but we don't want to make them depend on asgiref...

Dec 10 '23 15:12 andrewgodwin

asgiref asgiref copied to clipboard

Exceptions from `send`, and expected behaviors on closed connections.

🦋 Changeset detected

asgiref
asgiref copied to clipboard