asgiref
asgiref copied to clipboard
tweak lifespan spec to require propagating exceptions thrown from lifespan tasks after `'lifespan.startup.complete'`
currently in quart-trio a failure in the app.nursery is swallowed and the web server continues unabated
https://gitlab.com/pgjones/quart-trio/-/issues/21
once an ASGI app acknowledges a lifespan request (with a lifespan.startup.complete event) the server should treat failures in that task as fatal.
Note: Now that Starlette supports trio, this is currently making writing a Starlette TaskGroup middleware a challenge
some other ideas:
A new event
Perhaps there should be an additional event - 'lifespan.startup.acknowledge' which can be sent at any time that signals acceptance of the lifespan scope and that any failures are fatal
A sentinel BaseException
or there could be a:
class LifespanFailure(BaseException):
pass
and lifespan tasks could:
try:
with anyio.create_task_group() as app.tg:
...
except Exception as e:
raise asgiref.LifespanFailure from e
New asgi servers could extract the .__cause__ attribute of the LifespanFailure and raise that instead.
this would be backwards compatible because current asgi servers propagate BaseExceptions already
I do agree something needs to happen here - it's important to have explicit failure - but not a huge fan of the acknowledge event model, that just seems a little much, along with the weird exception handling, so propagating exceptions after complete seems reasonable?
I'd like to be able to propagate exceptions before startup complete too, but that's a bigger spec change. Eg if my task is cancelled I won't be able to send "lifespan.startup.failed"
So the specific behaviour you would want to change is that the ASGI server swallows any exceptions during startup:
If an exception is raised when calling the application callable with a lifespan.startup message or a scope with type lifespan, the server must continue but not send any lifespan events.
To have a specific exception that does stop the server? I think we can very reasonably support that in a backwards-compatible way, then.
starlette 0.16.0 is now slightly violating the asgi spec here: https://github.com/encode/starlette/blob/e45c5793611bfec606cd9c5d56887ddc67f77c38/starlette/routing.py#L628 and will send "lifespan.shutdown.complete" or "lifespan.shutdown.failed" regardless of how await receive() completes.
Eg if await receive() is completed with any value starlette will reply with "lifespan.shutdown.complete".
importantly: if await receive() throws a CancelledError - if that cancellation was initiated by a lifespan_context TaskGroup or CancelScope starlette will reply with "lifespan.shutdown.complete" even through no shutdown has actually been initiated.
additionally if that cancellation was initiated by an exception being thrown into a lifespan_context TaskGroup, starlette propagates it with "lifespan.shutdown.failed"
here's an issue where this spec violation bites:
https://gitlab.com/pgjones/hypercorn/-/merge_requests/56/diffs
I've been looking into this issue recently for Litestar and it would be great to continue the conversation.
Like others, we have a lifespan context manager that is entered on "lifespan.startup", and exited on "lifespan.shutdown". User's can do anything within this context, including spinning up asynchronous tasks. We even document it for that purpose (ref):
This can be useful when dealing with long running tasks
and starlette seems to promote it for similar (ref):
Consider using anyio.create_task_group() for managing asynchronous tasks.
If an exception is raised within that context but after we have sent "lifespan.startup.complete", should we raise it, emit "lifespan.shutdown.failed", or catch it, log it and keep the lifespan task running? We currently send "lifespan.shutdown.failed" and then re-raise the exception, however servers continue to serve after this (tested this against uvicorn, granian, hypercorn).