sanic Streaming responses are not closed correctly on early exits (like a TERM)

Streaming responses are not closed correctly on early exits (like a TERM)

Open jmarcet opened this issue 2 years ago • 10 comments

While handling a request with a streaming response, the eof is sent in the finally block. This usually works well but when Sanic is TERMed or INTed, those streams are never closed because Sanic tries to close them in the server.shutdown.after event, by that point all sockets are closed and the eof cannot be sent.

I get the behavior I expect by keeping a list of the open responses and closing them in Sanic's before_server_stop event, like this:

diff --git a/movistar_u7d.py b/movistar_u7d.py
index 0b14000..f632e97 100755
--- a/movistar_u7d.py
+++ b/movistar_u7d.py
@@ -58,6 +58,17 @@ VodArgs = namedtuple('Vod', ['channel', 'broadcast', 'iptv_ip',
 _channels = _iptv = _session = _session_logos = _t_tp = None
 _network_fsignal = '/tmp/.u7d_bw'
 _network_saturated = False
+_responses = []
+
+
[email protected]('before_server_stop')
+async def before_server_stop(app, loop):
+    global _responses
+    for resp in _responses:
+        try:
+            await resp.eof()
+        except Exception as ex:
+            log.exception(f'{repr(ex)}')
 
 
 @app.listener('after_server_start')
@@ -281,27 +292,27 @@ async def handle_flussonic(request, channel_id, url):
             log.info(f'[{request.ip}] {request.raw_url} -> Playing {vod_msg}')
             try:
                 _response = await request.respond(content_type=MIME_TS)
+                global _responses
+                _responses.append(_response)
                 await _response.send((await asyncio.wait_for(stream.recv(), 1))[0])
                 _stop = timeit.default_timer()
                 log.info(f'[{request.ip}] {request.raw_url} -> {_stop - _start}s')
             except asyncio.exceptions.TimeoutError:
                 log.error(f'NOT_AVAILABLE: {vod_msg}')
                 return response.empty(404)
-            try:
-                while True:
-                    await _response.send((await stream.recv())[0])
-            finally:
-                log.info(f'[{request.ip}] {request.raw_url} -> Stopped {vod_msg}')
-                try:
-                    await _response.eof()
-                except AttributeError:
-                    pass
+
+            while True:
+                await _response.send((await stream.recv())[0])
+
     finally:
         try:
             vod.cancel()
             await asyncio.wait({vod})
         except (ProcessLookupError, TypeError):
             pass
+        _responses.remove(_response)
+        await _response.eof()
+        log.info(f'[{request.ip}] {request.raw_url} -> Stopped 2 {vod_msg}')
 
 
 @app.get('/favicon.ico')

This happens both with current stable Sanic as well as the master branch.

I think this is a bug in Sanic handling of INT/TERM signals. Sanic does try to execute the finally block but when it is too late to do so.

Sep 09 '21 12:09 jmarcet

I found the bug in Sanic's code:

https://github.com/sanic-org/sanic/blob/ef4f058a6cd809a292838ae9aa4e3fa70e323877/sanic/server/runners.py#L187

❯ diff -pru ./runners.py-orig ./runners.py
--- ./runners.py-orig   2021-09-08 05:39:52.000000000 +0200
+++ ./runners.py        2021-09-09 15:25:39.000000000 +0200
@@ -184,7 +184,7 @@ def serve(

_shutdown = asyncio.gather(*coros)
loop.run_until_complete(_shutdown)
-        loop.run_until_complete(app._server_event("shutdown", "after"))
+        loop.run_until_complete(app._server_event("shutdown", "before"))

remove_unix_socket(unix)

Might be wise to run them in sequence, I'm not sure, I leave that to you guys

Sep 09 '21 13:09 jmarcet

Well, not quite so. I see the before signal is handled a few lines earlier, but that fails, see these debug logs from Sanic:

[2021-09-09 15:51:16] [917] [INFO] Received signal SIGTERM. Shutting down.
[2021-09-09 15:51:16] [919] [INFO] Stopping worker [919]
[2021-09-09 15:51:16] [921] [INFO] Stopping worker [921]
[2021-09-09 15:51:16] [918] [INFO] Stopping worker [918]
[2021-09-09 15:51:16] [920] [INFO] Stopping worker [920]
[2021-09-09 15:51:16] [919] [DEBUG] Triggering server events: server.shutdown.before
[2021-09-09 15:51:16] [919] [DEBUG] Dispatching signal: server.shutdown.before
[2021-09-09 15:51:16] [921] [DEBUG] Triggering server events: server.shutdown.before
[2021-09-09 15:51:16] [918] [DEBUG] Triggering server events: server.shutdown.before
[2021-09-09 15:51:16] [920] [DEBUG] Triggering server events: server.shutdown.before
[2021-09-09 15:51:16] [921] [DEBUG] Dispatching signal: server.shutdown.before
[2021-09-09 15:51:16] [918] [DEBUG] Dispatching signal: server.shutdown.before
[2021-09-09 15:51:16] [920] [DEBUG] Dispatching signal: server.shutdown.before
[2021-09-09 15:51:16] [919] [WARNING] Could not find signal server.shutdown.before
[2021-09-09 15:51:16] [921] [WARNING] Could not find signal server.shutdown.before
[2021-09-09 15:51:16] [918] [WARNING] Could not find signal server.shutdown.before
[2021-09-09 15:51:16] [920] [WARNING] Could not find signal server.shutdown.before
[2021-09-09 15:51:16] [919] [DEBUG] Triggering server events: server.shutdown.after
[2021-09-09 15:51:16] [919] [DEBUG] Dispatching signal: server.shutdown.after
[2021-09-09 15:51:16] [918] [DEBUG] Triggering server events: server.shutdown.after
[2021-09-09 15:51:16] [918] [DEBUG] Dispatching signal: server.shutdown.after
[2021-09-09 15:51:16] [921] [DEBUG] Triggering server events: server.shutdown.after
[2021-09-09 15:51:16] [921] [DEBUG] Dispatching signal: server.shutdown.after

For some reason signal server.shutdown.beforeis not found.

Sep 09 '21 14:09 jmarcet

The only way I find to avoid it is what I explain in the first message of this issue, creating my own before_server_stop event handler.

Sep 09 '21 14:09 jmarcet

It might also be important to note that I'm using multiple workers when I trigger this.

Sep 09 '21 15:09 jmarcet

I put some thought into this, and I think that the behavior is correct. Moving the graceful shutdown into before_server_stop (or, really, the equivalent in the server since it is the one that controls that event), creates the potential for a race condition since it would still be open to accepting connections.

Which sort of begs the question do we need to do shutdowns in two locations? Maybe.

One solution would be to set a flag to stop accepting connections early, then drain, close, and do a final task sweep. We would need to investigate this further, but simply moving the shutdown earlier seems like a potentially harmful move.

Sep 14 '21 09:09 ahopkins

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If this is incorrect, please respond with an update. Thank you for your contributions.

Mar 02 '22 08:03 stale[bot]

...

Mar 02 '22 09:03 ahopkins

Jun 12 '22 21:06 stale[bot]

Sorry, I had forgotten about this. I'm doing things a little bit different now (here and here), and I do not have the issue anymore. It all depends on the correct use of finally.

I do have to use another workaround to cancel tasks before the server is stopped, though. If I do not do that and the Sanic server is closed while a client is playing a stream, the client hangs. Not sure whether I should open another bug report for that.

Jun 14 '22 09:06 jmarcet

Sep 21 '22 05:09 stale[bot]

sanic sanic copied to clipboard

Streaming responses are not closed correctly on early exits (like a TERM)

sanic
sanic copied to clipboard