sanic
sanic copied to clipboard
Streaming responses are not closed correctly on early exits (like a TERM)
While handling a request with a streaming response, the eof is sent in the finally block. This usually works well but when Sanic is TERMed or INTed, those streams are never closed because Sanic tries to close them in the server.shutdown.after
event, by that point all sockets are closed and the eof cannot be sent.
I get the behavior I expect by keeping a list of the open responses and closing them in Sanic's before_server_stop
event, like this:
diff --git a/movistar_u7d.py b/movistar_u7d.py
index 0b14000..f632e97 100755
--- a/movistar_u7d.py
+++ b/movistar_u7d.py
@@ -58,6 +58,17 @@ VodArgs = namedtuple('Vod', ['channel', 'broadcast', 'iptv_ip',
_channels = _iptv = _session = _session_logos = _t_tp = None
_network_fsignal = '/tmp/.u7d_bw'
_network_saturated = False
+_responses = []
+
+
[email protected]('before_server_stop')
+async def before_server_stop(app, loop):
+ global _responses
+ for resp in _responses:
+ try:
+ await resp.eof()
+ except Exception as ex:
+ log.exception(f'{repr(ex)}')
@app.listener('after_server_start')
@@ -281,27 +292,27 @@ async def handle_flussonic(request, channel_id, url):
log.info(f'[{request.ip}] {request.raw_url} -> Playing {vod_msg}')
try:
_response = await request.respond(content_type=MIME_TS)
+ global _responses
+ _responses.append(_response)
await _response.send((await asyncio.wait_for(stream.recv(), 1))[0])
_stop = timeit.default_timer()
log.info(f'[{request.ip}] {request.raw_url} -> {_stop - _start}s')
except asyncio.exceptions.TimeoutError:
log.error(f'NOT_AVAILABLE: {vod_msg}')
return response.empty(404)
- try:
- while True:
- await _response.send((await stream.recv())[0])
- finally:
- log.info(f'[{request.ip}] {request.raw_url} -> Stopped {vod_msg}')
- try:
- await _response.eof()
- except AttributeError:
- pass
+
+ while True:
+ await _response.send((await stream.recv())[0])
+
finally:
try:
vod.cancel()
await asyncio.wait({vod})
except (ProcessLookupError, TypeError):
pass
+ _responses.remove(_response)
+ await _response.eof()
+ log.info(f'[{request.ip}] {request.raw_url} -> Stopped 2 {vod_msg}')
@app.get('/favicon.ico')
This happens both with current stable Sanic as well as the master branch.
I think this is a bug in Sanic handling of INT/TERM signals. Sanic does try to execute the finally block but when it is too late to do so.
I found the bug in Sanic's code:
https://github.com/sanic-org/sanic/blob/ef4f058a6cd809a292838ae9aa4e3fa70e323877/sanic/server/runners.py#L187
❯ diff -pru ./runners.py-orig ./runners.py
--- ./runners.py-orig 2021-09-08 05:39:52.000000000 +0200
+++ ./runners.py 2021-09-09 15:25:39.000000000 +0200
@@ -184,7 +184,7 @@ def serve(
_shutdown = asyncio.gather(*coros)
loop.run_until_complete(_shutdown)
- loop.run_until_complete(app._server_event("shutdown", "after"))
+ loop.run_until_complete(app._server_event("shutdown", "before"))
remove_unix_socket(unix)
Might be wise to run them in sequence, I'm not sure, I leave that to you guys
Well, not quite so. I see the before signal is handled a few lines earlier, but that fails, see these debug logs from Sanic:
[2021-09-09 15:51:16] [917] [INFO] Received signal SIGTERM. Shutting down.
[2021-09-09 15:51:16] [919] [INFO] Stopping worker [919]
[2021-09-09 15:51:16] [921] [INFO] Stopping worker [921]
[2021-09-09 15:51:16] [918] [INFO] Stopping worker [918]
[2021-09-09 15:51:16] [920] [INFO] Stopping worker [920]
[2021-09-09 15:51:16] [919] [DEBUG] Triggering server events: server.shutdown.before
[2021-09-09 15:51:16] [919] [DEBUG] Dispatching signal: server.shutdown.before
[2021-09-09 15:51:16] [921] [DEBUG] Triggering server events: server.shutdown.before
[2021-09-09 15:51:16] [918] [DEBUG] Triggering server events: server.shutdown.before
[2021-09-09 15:51:16] [920] [DEBUG] Triggering server events: server.shutdown.before
[2021-09-09 15:51:16] [921] [DEBUG] Dispatching signal: server.shutdown.before
[2021-09-09 15:51:16] [918] [DEBUG] Dispatching signal: server.shutdown.before
[2021-09-09 15:51:16] [920] [DEBUG] Dispatching signal: server.shutdown.before
[2021-09-09 15:51:16] [919] [WARNING] Could not find signal server.shutdown.before
[2021-09-09 15:51:16] [921] [WARNING] Could not find signal server.shutdown.before
[2021-09-09 15:51:16] [918] [WARNING] Could not find signal server.shutdown.before
[2021-09-09 15:51:16] [920] [WARNING] Could not find signal server.shutdown.before
[2021-09-09 15:51:16] [919] [DEBUG] Triggering server events: server.shutdown.after
[2021-09-09 15:51:16] [919] [DEBUG] Dispatching signal: server.shutdown.after
[2021-09-09 15:51:16] [918] [DEBUG] Triggering server events: server.shutdown.after
[2021-09-09 15:51:16] [918] [DEBUG] Dispatching signal: server.shutdown.after
[2021-09-09 15:51:16] [921] [DEBUG] Triggering server events: server.shutdown.after
[2021-09-09 15:51:16] [921] [DEBUG] Dispatching signal: server.shutdown.after
For some reason signal server.shutdown.before
is not found.
The only way I find to avoid it is what I explain in the first message of this issue, creating my own before_server_stop
event handler.
It might also be important to note that I'm using multiple workers when I trigger this.
I put some thought into this, and I think that the behavior is correct. Moving the graceful shutdown into before_server_stop
(or, really, the equivalent in the server since it is the one that controls that event), creates the potential for a race condition since it would still be open to accepting connections.
Which sort of begs the question do we need to do shutdowns in two locations? Maybe.
One solution would be to set a flag to stop accepting connections early, then drain, close, and do a final task sweep. We would need to investigate this further, but simply moving the shutdown earlier seems like a potentially harmful move.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If this is incorrect, please respond with an update. Thank you for your contributions.
...
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If this is incorrect, please respond with an update. Thank you for your contributions.
Sorry, I had forgotten about this. I'm doing things a little bit different now (here and here), and I do not have the issue anymore. It all depends on the correct use of finally
.
I do have to use another workaround to cancel tasks before the server is stopped, though. If I do not do that and the Sanic server is closed while a client is playing a stream, the client hangs. Not sure whether I should open another bug report for that.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If this is incorrect, please respond with an update. Thank you for your contributions.