tornado
tornado copied to clipboard
any way to graceful exit tornado application?
there are many gist show how to graceful exit tornado application like this:
stop_httpserver(http_server)
def try_stop_ioloop():
io_loop = IOLoop.instance()
if io_loop._callbacks or io_loop._timeouts:
io_loop.add_timeout(time.time()+1, try_stop_ioloop)
else:
io_loop.stop()
try_stop_ioloop()
it's safy to test io_loop._timeouts or _callbacks ? or _timeouts can be ignored?
https://gist.github.com/nicky-zs/6304878 here is one example using the method. some time main_ioloop._timeouts always have new _Timeout instance, we have to wait a long time for the _timeouts queue empty.
I came here today to ask the exact same question. For after much reading in SO I came up with this example:
import logging
import signal
import time
import tornado.httpserver
import tornado.ioloop
import tornado.web
import tornado.options
from tornado import gen
logger = logging.getLogger()
signal_received = False
class BlockingHandler(tornado.web.RequestHandler):
def get(self):
logger.debug("Starting sleep")
time.sleep(5)
logger.debug("Sleep done")
self.finish('block')
class AsyncHandler(tornado.web.RequestHandler):
@gen.coroutine
def get(self):
logger.debug("Starting range 5")
for i in range(5):
logger.debug(i)
yield gen.sleep(1)
logger.debug("Range done")
self.finish('async')
def start_server():
urls = [
(r'/', BlockingHandler),
(r'/async', AsyncHandler),
]
application = tornado.web.Application(urls)
http_server = tornado.httpserver.HTTPServer(application)
http_server.listen(8888)
ioloop = tornado.ioloop.IOLoop.instance()
def register_signal(sig, frame):
global signal_received
logger.info("%s received, stopping server" % sig)
http_server.stop() # no more requests are accepted
signal_received = True
def stop_on_signal():
global signal_received
if signal_received and not ioloop._callbacks:
ioloop.stop()
logger.info("IOLoop stopped")
tornado.ioloop.PeriodicCallback(stop_on_signal, 1000).start()
signal.signal(signal.SIGTERM, register_signal)
logging.info("Starting server")
ioloop.start()
if __name__ == '__main__':
tornado.options.parse_command_line()
start_server()
It has two problems:
- BlockingHandler works as expected (finishes the result and gives it to the client), but raises an Exception:
[E 160808 14:46:12 ioloop:633] Exception in callback None
Traceback (most recent call last):
File "/Users/margus/src/git/tornado-sigterm/venv/lib/python3.5/site-packages/tornado/ioloop.py", line 887, in start
handler_func(fd_obj, events)
File "/Users/margus/src/git/tornado-sigterm/venv/lib/python3.5/site-packages/tornado/stack_context.py", line 275, in null_wrapper
return fn(*args, **kwargs)
File "/Users/margus/src/git/tornado-sigterm/venv/lib/python3.5/site-packages/tornado/netutil.py", line 260, in accept_handler
connection, address = sock.accept()
File "/usr/local/Cellar/python3/3.5.2/Frameworks/Python.framework/Versions/3.5/lib/python3.5/socket.py", line 195, in accept
fd, addr = self._accept()
OSError: [Errno 9] Bad file descriptor
[I 160808 14:46:12 application:58] IOLoop stopped
.. which I think means that the tornado.access
logger is trying to write after the ioloop has stopped and all sockets to write to gone. Not sure whether a feature or a bug in the logging.
Second problem is with AsyncHandler, where request is killed and client receives Remote Disconnected error.
It would be very nice to get an official
right way of approaching this. I'll now go and test https://gist.github.com/nicky-zs/6304878 in a real world.
You definitely shouldn't be looking at IOLoop._callbacks
or IOLoop._timeouts
. In addition to being private variables, there's a good chance that they will never be empty, for reasons that are unimportant (if you use tornado_curl_httpclient
there is always an active timeout). Instead of asking the IOLoop for all activity, you need to decide what activity you care about and exit the server when all of that activity is done.
Or you can do what I do and just stop the IOLoop 5 seconds after the stop is requested. This will be enough for any regular request to finish, and if it's taking longer than 5 seconds you probably want to let it fail anyway. It's not worth making a more precise measurement to stop in less than 5 seconds.
This seems like a question that is asked frequently enough that an example solution should be included in the user's guide section of the docs. I would consider adding something to either the Running and deploying section or the Structure section. Ben, which do you think makes the most sense?
Set a timeout for stop IOloop
is not safe. Is there a approach available to test all request are processed?
Yeah, I think adding something to "Running and deploying" makes sense. There are two tricky things in writing this up: A) The right way to do this depends heavily on your deployment environment (how your load balancers do health checks and/or retries), and B) Convincing people who want to wait for all operations to finish that it's better to just use a timeout.
On that latter point: There must be some maximum time that you're willing to wait for an operation to finish, even if it's several minutes. At that point you have to be willing to cut them off or they'll keep consuming resources indefinitely. Once you've decided how long you're willing to wait for the last client operation to finish, you've already committed the resources to keep the old server processes around for that long, so why not leave them running for that duration in every case? Trying to track the number of operations remaining and stopping the server precisely when the count reaches zero is not worth the trouble IMHO.
You are right that there are plenty of caveats to how to really do a graceful shutdown depending on the deployment strategy. But for the guide, maybe it would suffice to show a simple example case (such as what you mentioned where you stop the IOLoop 5 seconds later to give requests time to finish) while explaining that this is not the only way, but that it at least works for simple setups. I'll take a stab at this in the next few days and maybe we can try to iterate through a good solution via a pull request.
It would be nice if we could specify what tornado should do on a SIGTERM.
For my kind of setup it would be nice to tell it, that it should stop accepting requests after time X and should quit as soon as all active requests are gone, additionally it would be good to be able to check this status (like .inShutdown() == true
)
This would make it quiet easy to gracefully stop it inside kubernetes by having an health handler which will start to become "faulty" as soon as .inShutdown() returns true. With this I can ensure that I take the service out of the loadbalancing without killing any active connections (if they don't take too long) and not loosing some because kubernetes is not taking it out of LB instantaneously (by design to not have one faulty response kicking the service out).
We use Ubuntu's upstart to run our services. While terminating the service, upstart
first sends SIGTERM, waits for the service to stop in the configured amount of time (by default timeout is 5s) and if the service fails to stop, sends SIGKILL.
I expect the behavior to be consistent across most of the *nix systems. It would be nice it Tornado can support these two signals by default along with a custom set of signals.
Tornado works correctly under the upstart configuration you described - it dies by default when it gets SIGTERM. What would you like it to do differently? This always seems to be application- and deployment-dependent (especially with respect to how your load balancers do health checks and retries), but whatever behavior you want can be obtained by defining your own signal handlers.
@bdarnell Thanks for your quick response. I think the reason it didn't work well for us is because we are using an older version of Tornado (2.1.1). Once I implemented my own signal handler for SIGTERM, things started working.
How to shutdown tornado 5.1 gracefully?
Here's how I do it (currently with tornado-4.5.3 but I expect it will work the same with tornado-5.1):
async def shutdown():
periodic_task.stop()
http_server.stop()
for client in ws_clients.values():
client['handler'].close()
await gen.sleep(1)
ioloop.IOLoop.current().stop()
def exit_handler(sig, frame):
ioloop.IOLoop.instance().add_callback_from_signal(shutdown)
...
if __name__ == '__main__':
signal.signal(signal.SIGTERM, exit_handler)
signal.signal(signal.SIGINT, exit_handler)
...
(instead of just gen.sleep(1)
I actually have a global active-request count and a tornado.locks.Event()
that is set after the last request has finished, but that's more complicated and more code ...)
Yeah, something like @ploxiln's approach is what I'd do. I don't think there have been any noteworthy changes between Tornado 4 and 5 here. It all depends on what "gracefully" means for your application. (One missing piece in the snippet above is that you probably want to signal to your load balancer somehow to stop the incoming traffic).
@ploxiln Hi mate, would you mind describing what ws_clients
is in this case?
sure, ws_clients
in my case was a list of dicts with each dict containing a 'handler' reference to an instance of WebSocketHandler - in short-hand:
ws_clients = [
{
'handler': WebSocketHandler(...),
'tags': ...
},
...
]
I had a WebSocketHandler which, for each websocket client which connected, would add "self" to this list. On close, it would find and remove "self" from this list. On shutdown, all still-open websocket connections could be cleanly closed in this way (in addition to other plain http requests having a second to finish up, while no new plain http or websocket requests are possible).
Linking to this gist in case anyone comes across this issue: https://gist.github.com/wonderbeyond/d38cd85243befe863cdde54b84505784
This works for me.
This is what I'm doing on Firenado:
https://github.com/candango/firenado/blob/develop/firenado/launcher.py#L159-L324
We managed to gracefully shutdown tornado by implementing the following steps:
- intercepting
SIGINT
/SIGTERM
signals - manually stop the
HTTPServer
right away (no more requests are accepted) - manually stop the
ioloop
when all pending requests terminated
See https://github.com/svaponi/tornado-graceful-shutdown/blob/main/server.py