trio More ergonomic support for graceful shutdown

[Edit: This issue started as a discussion of the API for cancel scope shielding, motivated by @merrellb using it to catch Cancelled errors and do clean up activities like sending "goodbye" messages in #143, but then morphed into a more general discussion of how trio could help users implement "graceful shutdown" functionality in general, so let's make it the standard issue for that. Original text follows. – @njsmith]

The documentation suggests assigning a cancel scope and setting shield as two separate steps:

with move_on_after(CLEANUP_TIMEOUT) as cleanup_scope:
    cleanup_scope.shield = True
    await conn.send_goodbye_msg()

However trio.open_cancel_scope has a shield argument and it seems like it would also make sense with the convenience functions (trio.move_on_after, trio.move_on_at, trio.fail_after and trio.fail_at). eg:

with move_on_after(CLEANUP_TIMEOUT, shield=True):
    await conn.send_goodbye_msg()

I would be happy to make the pull-request if this change makes sense.

May 02 '17 23:05 merrellb

For the record, the reason I didn't do this in the first place is that I feel like shield should be a pretty obscure and rarely used feature. It's very easy to use in a way that breaks your program in subtle ways (e.g. by making it unresponsive to rare network failure conditions), it breaks the rule that timeout policy should be set as far up the stack as possible, and I don't want it to become an attractive nuisance. So I included it in open_cancel_scope since that's a kind of low level "raw" API, but left it out of the high level timeout apis.

Maybe it will turn out the I'm wrong and shield really is something that gets used a lot, but I can't help but feel that that would indicate that we've messed up somehow.

In the mean time, I'm sort of feeling like maybe we should remove the kwarg from open_cancel_scope :-). It should be a bit annoying to use...

While we're at it, here's another idea: maybe we can have a way to say "yes, I know I'm cancelled, but please stop raising it unless an operation actually blocks". Like a shield with a zero second per operation timeout. So you could attempt to do things like send a goodbye message, but if it doesn't succeed immediately, oh well, let's move on. It would at least remove the issue of having to pick some magic timeout value way down inside the code...

... Though, I guess this doesn't actually help for your use case with websockets, since the main reason websockets have a goodbye message is so the initiator can then wait until they see the corresponding goodbye message from their peer, and know that both sides successfully processed all the data that was sent. (Since otherwise TCP can lose some data at the end of a connection.) Sending a goodbye message and then immediately closing the socket is pretty useless, but that's all my idea above would allow for.

Otoh I'm not entirely sure I buy that it makes sense to send websocket goodbye messages on control-C in the first place :-). "Potential data loss" sounds bad, but here it just means that the connection might act like you hit control-C a little earlier than you actually did... and who's to say that's not what happened? (Unless you're carefully monitoring the connection state and choosing when to hit control-C with millisecond precision, which seems like a weird hobby.) But maybe the problem is more general – certainly there are a lot of protocols and applications that have some concept of a graceful shutdown.

May 07 '17 19:05 njsmith

At the application level, I think a graceful shutdown might be fairly common. For example, I may want to make a call to a local database (ie something with time constraints I understand fairly well), storing the last record acknowledged by a remote WebSocket client. With nursery cancellation as the best way to manage a group of tasks, I think this may come up in many circumstances other than control-C (eg explicitly deciding to close a single connection)

I am not sure I understand your concerns about adding shield to move_on_after. Isn't that exactly what we should be wrapping our graceful connection shutdown code within? I would almost want it to be automatically enabled when exiting a canceled scope :-) . With a server spawning a nursery per incoming connection, how could we handle this at a higher level?

May 08 '17 06:05 merrellb

I think there are two things making me uneasy here:

I don't feel like I have a very clear picture of what the goal here should be. Cancelling work is inherently a messy process, networks are unreliable, and in a distributed systems you rarely have complete information. Like in your example, you say you want to record the last record acknowledged by the remote WebSocket client, but for all you know there might be another acknowledgement that they already sent, just it's still sitting in a router somewhere and you haven't seen it yet. (This is the Byzantine generals problem.) I totally understand the desire to make this kind of thing as reliable as possible! I want that too. But in my experience, the first and most important step in making a distributed system reliable is to have a clear picture of what we {have to / are willing to} give up. Complexity is the number 1 enemy of reliability, so papering over individual problems as they arise just doesn't work. And even in your example of making a last call to a local database, I don't know how to judge whether the complexity cost is worth it. If the control-C had arrived 0.1 seconds earlier, then you wouldn't even know that there was a last record to write, you'd just have exited, and apparently that would have been OK – so is it really worth bending over backwards to handle it? Maybe it is, I don't know.
As a matter of software design, I don't like the idea of low-level code imposing policy around timeouts and shutdowns. What if later on you decide that actually, on control-C exiting in 2 seconds is not good enough, you really need to exit in 1 second (or vice-versa). Do you have to go around and modify all the different places in your code that clean up resources?

What if you're using several libraries, that each have their own internal policy on how long they'll hold out when you try to cancel them? What if you really want to shut down ASAP? Or on the other hand, what if one library wants to wait for 5 seconds and another wants to wait for 10 seconds, so the whole system ends up waiting for 10 seconds and that's fine, but the first library gave up after 5 seconds, and loses data that could have been saved if it had known that it actually had 10 seconds to work with?

These two concerns are closely related, because if we're hard-coding timeouts into little pieces of the code that can't see the big picture, then we're very likely to make the wrong trade-offs; OTOH if timeouts are being configured as late as possible and at as high a level as possible, then it's much more plausible that we can make the right trade-offs.

I almost feel like what we want is for a top-down cancellation to be able to specify a "grace period" – so the code that calls cancel() can also specify how long it's willing to wait for the cancellation to complete. The problem then would be in communicating this to the code being cancelled. The simple version would be:

by default, code is immediately cancelled
but code that wants to allow for graceful shutdown can have something like a "soft shield" that shields against cancellation during the grace period, but not after the grace period expires
and there's some mechanism that code running under the "soft shield" can check whether a cancellation has occurred, so that it knows that it should shut down the connection or whatever, e.g. doing if currently_cancelled(): ... or something.

That's enough for cases where the connection is active, so e.g. we can imagine a HTTP server that every time it goes to send a response, it checks for currently_cancelled() and if so then it marks the response as being the last one of this connection and immediately closes it afterwards. But what it's missing is some way for the code that wants to support graceful cleanup to get notified when a cancellation happens – like if an HTTP server or your websocket server is sitting idle waiting for something to arrive, then when it's cancelled it should immediately wake up and shut down the connection without having to poll currently_cancelled().

One obvious way to do this would be to send in an exception, and then code that wants to handle it gracefully can catch it and do whatever clever thing it wants. But... in general, you can't necessarily catch a Cancelled exception and do something clever with it. (This is an important point that I missed before, but is relevant to what you're trying to do in general!). The mere fact of delivering a Cancelled exception can itself break your connection irrecoverably. Like, consider if you were in the middle of doing a sendall to transmit a websocket message frame, and that got interrupted – you can't necessarily catch that exception and send a goodbye frame, because the previous frame might have been only half-transmitted and now you've lost synchronization. A Cancelled exception is better behaved than a generic KeyboardInterrupt, but it's still a bomb going off in the middle of your program state.

Maybe it's enough to make the soft-shield state toggleable, so you could do something like:

async def handler(ws_conn):
    with soft_shield_enabled:
        while True:
            try:
                with soft_shield_disabled:
                    msg = await ws_conn.next_message()
            except Cancelled:
                await do_cleanup(ws_conn)
                return
            await ws_conn.send_message(make_response(msg))

...Need to think about this some more. I feel like there's something here, but it isn't fully gelled yet :-).

May 09 '17 00:05 njsmith

Note: if we do promote graceful cancellation to a first-class concept, then I guess we'll want to rethink restrict_keyboard_interrupt_to_checkpoints. Maybe something like an argument that (a) makes it so KI cancels everything (i.e. actually triggers a cancellation inside init), and (b) lets you specify the grace period on this cancellation?

May 09 '17 01:05 njsmith

I am thinking along the lines of long running WebSockets where I am not too worried about losing a few messages but rather several hours of accumulated state. A largely "in memory" scenario with lazy persistence for performance reasons. Being able to save a snapshot on disconnect is definitely worth some complexity vs potentially replaying hours of logs/journals.
As you can imagine I do like the idea of graceful cancellation being a first-class concept :-). I understand how it can be messy but I don't see why it needs to be messy for common cases, especially when one controls both ends of the connection.
I like your idea of a graceful shutdown period (from above) and soft-shield.
Do we also need to worry about partial receives when canceling? I could see potentially getting out of sync if we are expecting an ack as part of our shutdown. Is there some way to cancel only if the receive is idle?

May 09 '17 03:05 merrellb

I am thinking along the lines of long running WebSockets where I am not too worried about losing a few messages but rather several hours of accumulated state. A largely "in memory" scenario with lazy persistence for performance reasons. Being able to save a snapshot on disconnect is definitely worth some complexity vs potentially replaying hours of logs/journals.

I'm dubious about the wisdom of getting into the state where you have hours of valuable data accumulated in memory in the first place – maybe you should flush snapshots regularly on a timer? But that said, assuming there's a good reason to do things this way, I think you'd really want to implement this as a task that listens for control-C and then flushes a snapshot. There's nothing wrong with doing that if you have needs that are a little more unusual/extreme, and I think "I have hours of valuable data that I need to make sure get flushed to a database over an async connection while shutting down" counts.

Do we also need to worry about partial receives when canceling? I could see potentially getting out of sync if we are expecting an ack as part of our shutdown. Is there some way to cancel only if the receive is idle?

Yeah, you'd have to be careful that the code you put inside the soft_shield_disabled block (like ws_conn.next_message in the example above) is written in a cancellation-safe manner. So for example if it looks like:

    async def next_message(self):
        while True:
            message = self._proto_state.get_next_message()
            if message is not None:
                return message
            else:
                # Receive some data and then try again
                data = await self._transport.recv(BUFSIZE)
                self._proto_state.receive_bytes(data)

then we're pretty much OK – the only await is in the call to self._transport.recv, and if it raises Cancelled then we'll exit early and leave behind any data that we already received in _proto_state's internal buffer to finish reading next time. So as long recv itself is cancellation safe then next_message will be too. And SocketType.recv is cancellation-safe.

However... this gets tricky for some protocols.

If self._transport is an SSL-wrapped socket instead of a regular socket, then it's possible that calling SSLStream.recv will need to send data on the underlying socket (because of a horrible thing called renegotiation). The version of SSLStream in my branch (#107) definitely does not guarantee that the connection is still usable after recv is cancelled.

A similar issue arises at the protocol level. For websockets, the next_message method looks to the user like it's just reading, but if it ends up reading a ping frame then it might want to send back a pong. And then if this gets cancelled half-way through then again we've lost sync.

Solving these problems seems really hard :-(. The way these protocols work, once you start trying to send something then you're really committed to doing so – like you just can't stop in the middle of a websocket Pong frame and switch to a Close frame. (For OpenSSL you're even committing to retrying the same OpenSSL API call repeatedly; it's not enough to make sure that the bytes it gave you eventually get sent. But let's ignore that for right now.)

So what would be annoying but possible is to switch the sending to always use a send-style API call (which can do partial sends and reports how much data was sent) instead of a sendall-call, and only remove bytes from the internal protocol object's outgoing buffer when send accepts them. (I wasn't planning to provide send in the high-level stream interface at all but let's assume we have it.) This would have the effect that if one operation gets interrupted during a send, then the next operation on the same object would pick it up and finish it before doing whatever else it needs to do. So e.g. if we get cancelled in the middle of sending a Ping, then when we go to send the Close it'll send the second half of the Ping + the Close.

Possibly this would become simpler if we change the semantics of sendall (in the high level stream interface, not for the raw socket interface) so that on cancellation it holds onto the data and automatically prepends it to the next call to sendall? I'm wary because this moves us back towards the infinite buffers of twisted-like systems and somewhat violates the no-implicit-concurrency rule, but if it's restricted to cancellations, where currently the only thing you can do is just close the connection, then maybe it's good. Basically this would be implementing the buffering logic inside the stream object instead of the protocol object. ...though, maybe it wouldn't work with SSL, because we could get in a state where we need to call sendall(b"") to flush out the internal send buffer before anything will work, but we don't know it and block waiting to receive instead.

Another possibility that we add more shielding states. Specifically, a task could be in the four states listed down the left column here, and the table shows what checkpoints do in each case:

	Soft-cancelled	Hard-cancelled
Default	raise `Cancelled`	raise `Cancelled`
Soft-shield, exceptions OK	raise `CancelledIsComing`	raise `Cancelled`
Soft-shield, exceptions not OK	nothing	raise `Cancelled`
Hard shield	nothing	nothing

But we don't set these states directly; instead, there's are two context managers: one that marks its contents as "soft-cancellation enabled", and one that marks its contents as "state may become corrupted if a cancellation happens". Then the states are:

if there are any shield=True scopes on the stack below the cancelled scope, then we're in state "Hard shield"
else, if there is at least one "soft-cancellation enabled" manager in effect, and no "state may become corrupted if a cancellation happens" managers in effect, then we're in state "Soft-shield, exceptions OK"
else, if there is at least one "soft-cancellation enabled" manager in effect, and at least one "state may become corrupted if a cancellation happens" managers in effect, then we're in state "Soft-shield, exceptions not OK"
else, we're in state "Default"

So... this is really complicated. But the idea is that in the end, a method like SSLStream.recv puts the "state may become corrupted" manager around its calls to sendall, and by default this doesn't do anything, but now it can guarantee that if some code that supports soft-cancellation calls it, and a soft-cancellation happens, then SSLStream.recv will leave the stream in a consistent state.

Hmm.

May 10 '17 04:05 njsmith

Can we have a "task that listens for control-C"? I thought this was usually caught by the outer nursery itself after all tasks had canceled. While control-C is an important special case, I would think a problem with a single connection and the cancellation of its associated nursery would be the more routine scenario. I would want a graceful shutdown of these intermediate nurseries, giving a chance to persist a snapshot of the connection's associated application state.

Of the options you mention, the four shielding state approach sounds the most elegant although I realize this is beginning to also threaten the elegant simplicity you currently have :-) Still,l I do like this idea of providing extra protection where dragons lie while adding soft-cancellation to give time for a graceful shutdown (although that is easy for the guy not implementing it to say!)

May 10 '17 07:05 merrellb

Can we have a "task that listens for control-C"? I thought this was usually caught by the outer nursery itself after all tasks had canceled.

By default, KeyboardInterrupt gets delivered to some totally arbitrary place. In practice, it will often end up being delivered to the main task, which is sitting in the nursery __aexit__, and gets handled like any exception that happens inside the nursery scope (i.e., triggers a cancellation of everything in that nursery, and then gets propagated). The reason this is common is that it's what trio does if the control-C happens at a moment when the program is idle (no tasks currently running). But there's no guarantee of this – by default, it could be delivered into the middle of some delicate connection management code or whatever.

This blog post goes into a lot more detail about the trade-offs of different ways of handling control-C. In particular, there's discussion of why the default KeyboardInterrupt behavior (in both regular Python and in trio) is pretty much incompatible with guaranteeing a correct graceful cleanup, and also an example of how to write task that listens for control-C. In the example it just does raise KeyboardInterrupt, but it could just as easily do an await program_state.take_snapshot() first and then cancel everything, or any other arbitrary thing you want.

the four shielding state approach sounds the most elegant although I realize this is beginning to also threaten the elegant simplicity you currently have :-)

Yeah, I'm not sure... some kind of graceful shutdown is a pretty common/fundamental desire, so maybe it's worth it? But this is an area where trio's already ahead of the competition, and it's extremely tricky, and novel, so I'll probably let it stew for a while before making any decisions.

One thing that did occur to me is that I've already been thinking that in code like SSLStream that's "fragile" against cancellation (i.e., it can cause the object's internal state to become corrupted and unusable), it might be a good idea to have some guard that force-closes the stream on cancellation, like:

async def recv(self, max_bytes):
    try:
        # ... complicated stuff here ...
    except Cancelled:
        # our state is broken, but at least we can make sure that any attempt to use it fails fast
        self.forceful_close()
        raise

If I wrapped that up into a context manager like with cleanup_on_cancelled(self.forceful_close): ..., then that context manager could also set the flag that prevents soft cancellations from being delivered inside it if soft-cancellation handling is enabled – since these are two strategies for handling this kind of fragile state, they're needed in the same places.

Hmm.

May 10 '17 08:05 njsmith

I wonder if we should care that in principle someone might want to do some sort of graceful shutdown but doesn't care about the integrity of the stream they're reading from right now, it's some other object that needs cleanup.

May 10 '17 16:05 njsmith

This seems like a relevant paper for the whole "cancellation fragility" issue: Kill-Safe Synchronization Abstractions

Jul 10 '17 10:07 njsmith

Was thinking about this again in the shower today. It seems like could maybe simplify the whole "soft shield" idea by making the default state be soft-shielded. That is, the semantics would be something like:

cancel_scope.graceful_cancel(timeout), cancel_scope.graceful_cancel_at(time): set the deadline to min(current_deadline, implied_deadline), and also set the gracefully_cancelled flag

Code that's cancellation-oblivious gets the full time to complete. This seems pretty reasonable -- it's a bit annoying for code that's cancellation oblivious and runs forever, because it will use up the full graceful period before getting killed, but it's definitely a better default for any code that normally runs for a finite amount of time, and if infinite loops want good graceful cancellation then they generally need to do something special anyway.

If you don't want to set a timeout, you can use inf, but probably almost everyone wants to set some timeout so might as well make it an opt-out thing in the API.

Then code that wants to do something clever on graceful cancellation can explicitly register some interest -- I guess by flipping a flag to say "please consider graceful cancellations to be cancellations". (That's sufficient to implement wait_for_graceful_cancellation().) We might also want a currently_in_grace_period() function to poll that state – not sure.

Certainly that's enough for an HTTP server that wants to stop accepting new keepalive requests when the graceful shutdown is triggered. For the case of the websocket server that wants to send a goodbye over TLS... well, one option is to say that grace periods just don't mix well with renegotiation, sorry. In general renegotiation and similar is even less well-supported than I realized when I wrote the above. Or we could make SSLStream.receive_some guaranteed to survive cancellation; that's #198. Or we could have the ability to toggle the "please consider graceful cancellations to be cancellations" back to disabled, when calling transport_stream.send_all() from inside ssl_stream.recv_some.

Aug 10 '17 00:08 njsmith

The terminology here could use some work... graceful is on the tip of my tongue because I just stripped it out of AsyncResource, but we still have trio.aclose_forcefully, which is pretty unrelated. Grace period? cancel_with_grace_period(timeout)?

Aug 10 '17 00:08 njsmith

Another thing I just realized: for the minimal version of this idea where setting a grace period acts as (a) setting an overall deadline, and (b) setting a flag indicating that everyone should hurry up, and (c) letting specific pieces of code opt-in to receiving a cancellation when this flag is set, then it's actually totally possible to prototype as a third-party library. Having it built in would potentially have extra advantages, specifically in terms of letting the built-in server code integrate and letting SSLStream do clever things with renegotiation, and also being a standard thing that the whole ecosystem can coordinate around, but I don't think those are necessary for a prototype to be useful.

Aug 21 '17 21:08 njsmith

There's some interesting discussion of graceful shutdown in Martin Sustrik's Structured Concurrency post, from a rather different perspective.

Dec 23 '17 21:12 njsmith

A few thoughts that could be useful (after seeing the thread here: https://mail.python.org/pipermail/async-sig/2018-January/000437.html):

First, I'd make separate the distinction of what "knobs" a library API might want to expose to its users. These could be things like "finish in this total amount of time no matter what (including clean-up)," "allow x time for clean-up step y (no matter what)," or "aim to finish in this amount of time (excluding clean-up)." Then it would be up to the library author to use that information as it sees fit. For example, if a user specified 30 seconds absolute total time and 10 seconds for clean-up, then the library would compute and allot 20 seconds for the "success" case. Or the library could even tell the user their knob values are incompatible. Lastly, it would be up to the framework (trio in this case) to expose to the library author the primitives to support using that information. One consequence of this way of thinking is that it seems like it would be best if the framework didn't even have to know about concepts like "shutdown" and "graceful." Instead, it could expose lower level operations like increasing or decreasing the allotted time, or execute this block of code within this computed amount of time, and it would be up to the library author to map those operations to support graceful shutdown, etc.

Jan 15 '18 01:01 cjerdonek

I think once you get into defining a set of knobs whose semantics are specific to the particular function being called, then those are what we call function arguments :-).

In general, there's a big trade-off between semantic richness and universality -- if Trio adds something to the core cancellation semantics, then everyone using trio has to deal with it to some extent, whether it makes sense in their case or not. This can be good – the reason trio has core cancellation semantics in the first place is that libraries that don't support cancellation/timeouts are essentially impossible to use correctly, and getting it right requires some global coordination across different libraries, so it's actually good that we force everyone to think about it and provide a common vocabulary for doing so.

The need for universality definitely constrains what we can do though. We can't in general phrase things as "make sure to be finished by time X", because this requires being able to estimate ahead of time how long cleanup operations will take, and there's no way to do that for arbitrary code. And just in general I don't want to provide tons of bells and whistles that nobody uses, or are only useful in corner cases.

The semantic difference between regular execution and a "soft/graceful cancel" is that you're saying you want things to finish up, but allowing them to take their natural time. This is particularly relevant for operations that run forever, like a server accept loop – for them a graceful cancel is basically the same as a cancel. But for an operation with (hopefully) bounded time, like running an HTTP request handler, it can keep going. So it's... at least kinda general? And it definitely has some important use cases (HTTP servers!). But I'm not sure yet whether it's a good generic abstraction or not.

Jan 15 '18 10:01 njsmith

Another thing I just realized: for the minimal version of this idea where setting a grace period acts as (a) setting an overall deadline, and (b) setting a flag indicating that everyone should hurry up, and (c) letting specific pieces of code opt-in to receiving a cancellation when this flag is set, then it's actually totally possible to prototype as a third-party library

Prototype here in case anyone wants to experiment with this: https://gist.github.com/njsmith/1c83788289aaed49e091c8281d85a85e

(Please do, that would be very helpful for figuring otu if it's something we really want to do :-))

Aug 21 '18 02:08 njsmith

I like the proposed GracefulShutdownManager in terms of its simplicity-to-functionality ratio, but I don't think it necessarily solves all the problems that lead people to want something like graceful cancellation. I think there are actually two distinct problems, and they probably want to be addressed separately, though they might both benefit from similar additions to _core if we go that route.

Safely stopping a server

Here's a motivating example which is similar to a thing I'm trying to solve with trio right now: Suppose you have a protocol that multiplexes multiple logical streams over a single TCP connection. Examples include HTTP/2, SSH, and various RPC protocols (if you think of each request as creating a new stream). Suppose that, under normal circumstances where there isn't a network partition or similar, you'd like to be able to close each stream in such a way that both sides agree on what each side received. That effectively requires the party initiating the close to send a message and wait for a response from the remote side. Suppose individual connections/streams are arbitrarily long-lived and you can't just wait for them to die on their own. Now, how do you do a graceful shutdown of a server for this protocol?

Well:

the server is closed when it has no more connections after shutdown has been requested, and stops accepting new connections once it's told to start shutting down
each connection is closed when it has no more streams after shutdown has been requested, and stops accepting new streams once it's told to start shutting down
an individual stream is closed once the local and remote sides have both sent messages requesting to close it; it stops accepting new data from the remote side after a locally-initiated close, and sends a close message when it's told to start shutting down and is ready to do so
what do I mean by "is ready to do so"? well, for something like an RPC protocol, the individual stream (request) might not be able to send the close message until the underlying call completes - when shutdown is requested, we can cancel the call, but we need to wait for it to unwind before we can close the stream (because we need to tell our peer whether it was completed or cancelled)

So we wind up with something of a tree structure. Hmm... what else in Trio looks like a tree...

I think graceful cancellation in the "how do we bring down this tree of infinite loops cleanly" sense might want to be implemented as a new type of nursery, in which most tasks inside the nursery are shielded from a cancellation originating outside the nursery until all the nursery's marked-as-important children have been cleaned up or a grace period expires. I'm still working out some of the details, but I think this can be implemented without modifying _core (though it would be a good deal less gnarly with more internal support). I'll report back once I have something I can share.

Properly cleaning up resources upon cancellation

The second problem has to do with resources that require blocking to clean up correctly. The canonical example of this in my mind is a subprocess with resources of its own: if you're waiting in aclose for it to exit and you get cancelled, you should send SIGTERM and keep waiting for "a little while" before you send SIGKILL. If you send SIGKILL right away upon cancellation, you get to unwind your stack immediately, but the subprocess doesn't get a chance to clean up its own resources.

This could be handled using the "graceful nursery" abstraction from above. For example, subprocess aclose ignoring pipes could be:

async def aclose(self):
    try:
        async with open_graceful_nursery() as nursery:
            nursery.add_shutdown_task_sync(self.terminate)
            await self.wait()
    finally:
        if self.returncode is not None:
            self.kill()
            with open_cancel_scope(shield=True):
                await self.wait()

And then we get to piggyback off of whatever mechanism "graceful nurseries" use to make the grace periods inherit how you think they should. But now "graceful nurseries" have to be in trio, in order to be usable by things in trio. And users who care about how long we wait between SIGTERM and SIGKILL now have to understand this "graceful nursery" thing with lots of features they don't care about. That seems suboptimal.

Can we add a simpler concept to _core that both "graceful nurseries" and this sort of let-me-block-for-a-bit-to-cleanly-close-this-thing logic can make use of?

Idea: cancel scopes have grace periods

I think we can do it with something like this:

Instead of two states (not cancelled and cancelled), cancel scopes have three: not cancelled, cancelled pending cleanup, fully cancelled. Calling cancel() makes you one notch more cancelled than you previously were. We can add soft_cancel (don't go past "cancelled pending cleanup") and hard_cancel (go directly to "fully cancelled") if we think they'd be more useful than confusing.
Each cancel scope has an associated "grace period", measured in seconds. If not specified for a particular cancel scope, the grace period is inherited from its parent. (Inherited by reference - if you change the parent's grace period, or change which cancel scopes are your parent due to start(), the children that haven't specified their own grace period reflect this change.) The top-level grace period (ie, the one associated with the cancel scope that surrounds the system nursery) is zero by default and can be changed with a run() argument. When a scope enters "cancelled pending cleanup" state, it registers itself with the runner to be cancelled again at current_time() + grace_period, so that it will enter "fully cancelled" state after the grace period expires.
We add a new type of shield, which shields from "cancelled pending cleanup" scopes but not "fully cancelled" scopes.

Then Process.aclose is

async def aclose(self):
    try:
        await self.wait()
    finally:
        if self.returncode is not None:
            self.terminate()
            try:
                with soft_shield():
                    await self.wait()
            finally:
                if self.returncode is not None:
                    self.kill()
                    with open_cancel_scope(shield=True):
                        await self.wait()

and "graceful nurseries" are much easier to implement (they don't have to deal with the grace period inheritance issue, which in my experience is where like 80% of the gnarliness comes from).

Thoughts?

Dec 17 '18 20:12 oremanj

I've written an implementation of the "cancel scopes have grace periods" idea, on the graceperiod branch of oremanj/trio. It's on top of my unbound cancel scope diff (#835) so no PR yet, but I'm curious for thoughts on the interface/semantics/complexity from anyone who feels so inclined: https://github.com/oremanj/trio/compare/unboundcxl...oremanj:graceperiod

Jan 03 '19 23:01 oremanj

Not sure if this is the best place for this thought, but I want to capture it before I forget. (And @oremanj I'll comment on the rest of your proposal soon!)

It seems like there are a few places where we might want to control the order of cancellation within a nursery. Reading @oremanj's comment above made me think of Erlang supervisors using the "rest for one" strategy, where children are treated as having linear dependencies between them – imagine task B is using some service provided by task A, and C is using a service provided by task B. So you want to start them in the order A -> B -> C, if any task crashes then you restart it + all the tasks after it in the ordering (not 100% sure about how this works in practice – seems like there's some puzzle about how to wire things up again – but that's the concept). And, presumably, when shutting down, you want to shut down in the order C -> B -> A. So, maybe you want to make it so if the whole thing is cancelled, then A and B are shielded until C exits, and then A is shielded until B exits?

Another example is raised by @touilleMan here: https://github.com/python-trio/pytest-trio/issues/71#issuecomment-453503438 The linked code uses shielding to prevent the background tasks in a trio-asyncio loop from being exposed to cancellation before the code running inside the loop body finishes unwinding. This is made more complicated by the way trio-asyncio erases some of the tree structure that trio programs normally have, but does fit the pattern of shutting down service consumers before service providers.

pytest-trio goes to a lot of trouble to unwind fixtures in the correct order; I guess external cancellation might mess this up (though I'm not sure how you get an external cancellation in pytest-trio).

Even trio-websocket kind of has this problem... Arguably when you use async with open_websocket(...) as ws:, you want to shield the background tasks that service the network after the with body is finished.

(As part of the MultiError v2 work, we may want to add a concept of "nursery where the background tasks promise to be unobtrusive and not crash": https://github.com/python-trio/trio/issues/611#issuecomment-421522427 Probably there's substantial overlap between those cases and the cases where you want to shield the background tasks until the parent exits the with block. Maybe there's some synergy there we can exploit?)

This seems closely related to the graceful cancellation question to me, because it gets at the core question of what exactly "cancellation" means. Our current cancellation is used for unwinding after an exception, so it has to be pretty reliable about getting things shut down, even when users didn't think about it when writing their code. (And in exchange, it's acceptable for it to be pretty heavy-handed, like exception unwinding is.) It needs to have "ASAP" semantics, because it's pretty awkward to expect a random exception to somehow configure a grace period on the cancellations it triggers (and we'd still need to support ASAP semantics after the grace period expired).

None of this seems to preclude putting ordering constraints on cancellation. Once we do bring the hammer down on a task, we already rely on exiting eventually, so waiting for that to happen before signallimg another task probably wouldn't reduce the overall reliability of cancelling that much?

The "soft cancelling" idea is also based on this intuition of tasks having service dependencies. A soft cancellation should (a) tell leaf tasks to exit immediately (though maybe at a more controlled place than a hard cancel would), (b) tell non-leaf to switch into a mode where it they have no consumers, they should exit instead of waiting for new consumers to arrive.

Jan 11 '19 15:01 njsmith

Ooh, I really like this framework for thinking about structured cancellation. It nicely decouples the "shut down my server in an orderly way" and "give me a bit more time while I'm unwinding" parts of my proposal from each other.

In the limiting case, we could have an arbitrary tree of depends-on relationships between the tasks in a nursery. But that gets tricky to specify in full generality (since Task is in hazmat and doesn't get returned from start_soon) and feels confusing (Trio already has a task tree, now we're thinking of another kind of task tree for a totally different purpose?). What if we were to make all nurseries behave like "the nested child gets cancelled first; other tasks don't get cancelled until the nested child exits"? The nested child is already special in some other ways, and the rule would be easy to implement (when contemplating delivering a cancellation to task, block it unless task.parent_nursery._parent_waiting_in_aexit). This naturally gives the right semantics for open_websocket, and tends to cancel tasks earlier if they're closer to the leaves of the task tree. And it's easy to teach because it's simple. What do you think?

Jan 11 '19 16:01 oremanj

Here's an implementation of "cancel the nested child before the other children":

@asynccontextmanager
async def open_service_nursery() -> AsyncIterator[trio.Nursery]:
    """Provides a nursery augmented with a cancellation ordering constraint.

    If an entire service nursery becomes cancelled, either due to an
    exception raised by some task in the nursery or due to the
    cancellation of a scope that surrounds the nursery, the body of
    the nursery ``async with`` block will receive the cancellation
    first, and no other tasks in the nursery will be cancelled until
    the body of the ``async with`` block has been exited.

    This is intended to support the common pattern where the body of
    the ``async with`` block uses some service that the other
    task(s) in the nursery provide. For example, if you have::

        async with open_websocket(host, port) as conn:
            await communicate_with_websocket(conn)

    where ``open_websocket()`` enters a nursery and spawns some tasks
    into that nursery to manage the connection, you probably want
    ``conn`` to remain usable in any ``finally`` or ``__aexit__``
    blocks in ``communicate_with_websocket()``.  With a regular
    nursery, this is not guaranteed; with a service nursery, it is.

    Child tasks spawned using ``start()`` gain their protection from
    premature cancellation only at the point of their call to
    ``task_status.started()``.
    """

    async with trio.open_nursery() as nursery:
        child_task_scopes = trio.CancelScope(shield=True)

        def start_soon(
            async_fn: Callable[..., Awaitable[Any]],
            *args: Any,
            name: Optional[str] = None,
        ) -> None:
            async def wrap_child(coro) -> None:
                with child_task_scopes.linked_child():
                    await coro

            coro = _get_coroutine_or_flag_problem(async_fn, *args)
            type(nursery).start_soon(nursery, wrap_child, coro, name=name or async_fn)

        async def start(
            async_fn: Callable[..., Awaitable[Any]],
            *args: Any,
            name: Optional[str] = None,
        ) -> Any:
            async def wrap_child(*, task_status: trio.TaskStatus[Any]) -> None:
                # For start(), the child doesn't get shielded until it
                # calls task_status.started().
                shield_scope = child_task_scopes.linked_child(shield=False)

                def wrap_started(value: object = None) -> None:
                    type(task_status).started(task_status, value)
                    if trio.hazmat.current_task().parent_nursery is not nursery:
                        # started() didn't move the task due to a cancellation,
                        # so it doesn't get the shield
                        return
                    shield_scope.shield = child_task_scopes.shield

                task_status.started = wrap_started
                with shield_scope:
                    await async_fn(*args, task_status=task_status)

            return await type(nursery).start(nursery, wrap_child, name=name or async_fn)

        nursery.start_soon = start_soon
        nursery.start = start
        try:
            yield nursery
        finally:
            child_task_scopes.shield = False

I really like this - I was able to use it to replace all explicit shielding in a program I was working on. If we don't want all nurseries to work that way, maybe we want to have "service nursery" be a concept that does both cancel-the-nested-child-first and don't-wrap-nested-child-exceptions-in-an-ExceptionGroup?

Jan 19 '19 23:01 oremanj

So looking at this again..... @oremanj, I think that if we add this concept of "service nurseries" or some kind of way to control the order that cancellation gets distributed, and remove the bits of your "cancel scopes have grace periods" proposal that overlap, then your proposal collapses down to become equivalent to my earlier proposal? (Maybe?) Does that sound at all plausible?

Jan 27 '19 11:01 njsmith

I think that if we add this concept of "service nurseries" or some kind of way to control the order that cancellation gets distributed, and remove the bits of your "cancel scopes have grace periods" proposal that overlap, then your proposal collapses down to become equivalent to my earlier proposal? (Maybe?) Does that sound at all plausible?

Yep, pretty much. The devil is in the details, but then, that's always true. :-)

I no longer think "graceful nurseries" as discussed above are a good idea -- what we Really Want (TM) is more like the "service nurseries" idea plus the base graceful cancellation support, and that's much much easier to think about. So, what's in "base graceful cancellation support"?

When you explicitly call cancel(), you should be able to specify a grace period that goes along with that cancellation, like you proposed above. The cancellation takes immediate effect for code not within a "soft shield" (which I've been calling with trio.shield_during_cleanup():), and takes effect after the grace period for code within a "soft shield".
If you set a deadline on a cancel scope, you probably also want the ability to set a grace period that goes along with that deadline. When the deadline expires, it's like calling scope.cancel(grace_period=scope.grace_period).
- Open question: supposing this is in fact just an attribute of the cancel scope called grace_period, should it also be the default if you call cancel() without specifying a grace period? Or should cancel() without an explicit argument always do a hard cancel, and cancel_scope.grace_period only applies to cancellations that occur due to deadline expiry?
Should it be possible to specify the grace period that gets used when a nursery cancels its tasks to propagate an exception? Maybe not -- we want exception propagation to be fast.
Should it be possible to specify the grace period that gets used when a Ctrl+C is delivered? This one seems pretty useful, which creates some question relative to the previous bullet, since KeyboardInterrupt is an exception.
Should it be possible to specify that every cancellation, anywhere in a Trio program, uses a certain grace period unless otherwise specified? Maybe there is a default-grace-period contextvar?
Sometimes there are cleanup tasks that absolutely must finish. You can use a regular, "hard" shield for these. But, "this is a bounded-time operation that you have to wait for" is more expressive than just setting a shield -- in particular, it could be used to give other tasks more cleanup time too. (This is your "library A wants 5 seconds, library B wants 10 seconds, so you take 10 seconds, but then it doesn't make sense to cut library A off at 5 seconds" example from upthread.) We can't just interpret all shields as extending the cleanup period, because that could lead to deadlocks when using shields for cancel ordering (like in my service nursery example). Should we have a separate way to say "this code needs at least N seconds for cleanup, even if the grace period of the cancellation that hits it is smaller than that"?

I no longer think the "grace period inheritance" bit in my original proposal is worth the complexity.

Feb 11 '19 07:02 oremanj

I think for most purposes, it's going to be nicer to make a soft-cancel be "opt-in" instead of "opt-out". I.e., by default it does nothing, unless you do with promote_soft_cancel_to_hard or something like that. For example, consider an HTTP/1.1 server: when you soft-cancel it, there are specific points that should shut down immediately (the accept loop; the bit of the protocol parsing code that blocks to wait for the next request to start). There's also a huge, arbitrarily complex amount of code that you don't want to cancel, i.e., everything inside the user's request handler. But it's possible that some specific pieces of the user's request handler do want to opt-in to soft-cancellation notification, for example if the handler's implementing a "long poll".
I'm having trouble thinking of situations where you want to say "don't soft cancel yet, but do in X seconds". Usually it's more like "we just got SIGTERM, so soft cancel now, and hard cancel in X seconds". If so, then I think that allows for a lot of simplification? I'm imagining that CancelScope grows a .soft_cancel() method. The deadline continues to trigger a hard cancel if it expires. So a "clean shutdown with grace period" is just (a) calling .soft_cancel(), (b) setting the deadline appropriately. Are there any realistic cases this doesn't cover?
Should it be possible to specify the grace period that gets used when a Ctrl+C is delivered? This one seems pretty useful, which creates some question relative to the previous bullet, since KeyboardInterrupt is an exception.

I don't think so. There are two cases. The first case is for programs that leave control-C handling on its default settings. For them, we deliver KeyboardInterrupt to whatever code is running. This is important because it's how we can break out of while True: pass, where we might not be able to deliver a cancellation at all. But I can't see any reasonable way to combine this kind of delivery with a grace period. (Something something background thread + pthread_cancel... let's just not go there.) The second case is programs that care about shutting down carefully. They register a signal handler, and when SIGINT arrives they run whatever code they want. So if they want a grace period they can set it themselves.
Thinking about it more... I realized I have a doubt about the "service nursery" idea. For trio-asyncio, OK, the rationale is pretty clear, your actual logical call stack is chopped up into a bunch of tasks across the nursery, so cancelling all those pieces at once is not going to do what you want. But for most cases, like trio-websocket... it seems intuitive that the background tasks should stick around as long as the main task does, so they can provide services. But... generally consuming those services involves executing a checkpoint, because, y'know, inter-task communication. So how can the main task tell if the background services are there or not, once the main task is cancelled? It can't talk to them. Of course this is only approximately true, there's Event.set and shielding and whatever, but I mean, as a general rule of thumb, am I wrong?

Feb 11 '19 10:02 njsmith

I think we might be talking about two different things when we talk about "soft cancellation":

stop the infinite loops, leave everything else alone (for some amount of time)
cancel like normal, but allow the code that executes while unwinding from the cancellation to block (for some amount of time)

I think these are both useful in different circumstances, but I'm more interested in type 2 for the purposes of this thread, because it's a lot harder to support without handling in _core. (Type 1 does great with the GracefulShutdownManager you posted; the main benefit to standardizing it is that all the server libraries can use the same one.) Also, I sort of get the sense that type 2 is more urgently desired by the Median Trio User at this point? @merrellb's desire that spawned this thread was type 2, as is the example in the Trio docs that demonstrates shielding, as are most of the cases I've run into.

Type 1 wants the default to be "soft cancellation does nothing", and you mark the places that should respond to a soft cancellation (an appropriate point in each the infinite loops); Type 2 wants it to be "soft cancellation is cancellation", and you mark the places that should get the extra time (to a first approximation, all the send_all calls and the __aexit__ and finally blocks). Apart from the defaults, I don't think they actually need different underlying mechanisms, but if we only had to support Type 1 those mechanisms could be substantially similar, as you point out.

Type 2 graceful cancellation at a high level probably looks like with move_on_after(30, grace_period=5):. This is the sort of interface we might want for:

run this subprocess, send it SIGTERM if it's not done in 30 seconds, send it SIGKILL if it doesn't exit from SIGTERM in 5 seconds
do this RPC call; if it doesn't finish in 30 seconds, tell our peer to cancel it, and wait up to 5 seconds for the response that says "yup, this was cancelled and no state was changed", in the absence of which we have to assume an indeterminate state/network partition and can't proceed with the thing we wanted to do
do this request-response interaction on an SSLStream; give up after waiting 30 seconds for the response, but if we can avoid breaking the stream by waiting up to 5 seconds longer for a send_all to finish, do that please

I think this is pretty generally useful?

[Ctrl+C handling]

Agreed, it does seem like the best option is to say "if you want graceful shutdown on Ctrl+C, use open_signal_receiver and write the soft_cancel() or whatever-it-is yourself".

[service nurseries] So how can the main task tell if the background services are there or not, once the main task is cancelled? It can't talk to them.

Yeah, service nurseries are only useful if the body of the async with block is also using Type 2 graceful cancellation, such that it might have need of the services after the cancellation is first raised. (Presumably a Type 1 graceful cancellation wouldn't affect the services in the first place.)

Feb 11 '19 11:02 oremanj

So far I've been imagining Type 2 graceful cancellation support using the tri-state cancel scope model (not cancelled / cancelled but shieldable by "soft-shield" scopes ie "cancelled pending cleanup" / cancelled and only shieldable by "full-shield" scopes ie "fully cancelled"). It occurs to me that we might also model it as an edge-triggered cancellation now (each task receives one Cancelled at its next checkpoint that is not soft shielded) plus a traditional level-triggered cancellation after the grace period expires. Maybe that would be easier than supporting two phases of level-triggered cancellation?

Feb 11 '19 11:02 oremanj

Huh. Yeah, I'm definitely thinking of "Type 1" shutdown. I've never encountered the other type before. I'm trying to figure out if I believe in it as a single coherent category or not :-).

What makes me nervous is in your examples, they all feel very specific to the details of the situation. Trio's cancel scopes shouldn't know the difference between SIGTERM and SIGKILL (though maybe Process objects should expose some configuration for this?), and they definitely don't know how your distributed RPC system copes with network partition. (See also this post up-thread: https://github.com/python-trio/trio/issues/147#issuecomment-357647659) And the SSLStream example seems artificial – if my peer is so uncooperative they aren't even draining their receive buffer then how is waiting 5 seconds going to help? better to throw this connection away and get a better one!

OTOH everyone knows what an infinite loop is :-) (which is a nice characterization btw, thank you).

If I hand you some random third-party library, like I dunno, a complex HTTP app running under hypercorn, then what do you expect it to do if you send a "type 2 soft cancel"? what kind of guarantees would you expect it to give, that you could rely on, and file bugs if they weren't met?

I may just be suffering from a lack of imagination, because I haven't seen systems that supported these more powerful concepts... do you have any more worked-out examples of where you've run into this, as a Not Exactly Median But Let's Call It Close Enough Trio User (or "NEMBLCICETU" for short)?

Feb 11 '19 12:02 njsmith

The common thread I'm trying to point at with the "type 2 soft cancellation" is the idea that sometimes, while you can just tear everything down forcefully, and you need to be able to do so correctly to deal with misbehaving peers/networks/etc, you will get better outcomes if you can take a moment to tear things down non-forcefully when you get cancelled. And this probably just adds to your total deadline in practice if you hit the deadline because of a network problem, but maybe the slowdown was for some other reason and the network operations required for your graceful closure will complete quickly.

Trio's cancel scopes shouldn't know the difference between SIGTERM and SIGKILL

Definitely! But if Trio's cancel scopes provide a common vocabulary for saying "start sweeping the floors and telling customers we're closing in X seconds, and actually shove them out the door Y seconds after that if they haven't left yet", it's natural for phased shutdown of a process to use that vocabulary. (As for why we should care about supporting SIGTERM-wait-SIGKILL: it gives the child process time to clean up its resources, its own child processes, etc.)

and they definitely don't know how your distributed RPC system copes with network partition.

Nope -- but if you can distinguish "this call timed out but the cancellation was processed normally" from "this call timed out and also I didn't hear back about my cancellation request in a reasonable amount of time", that's a useful input into the application-level code for deciding that a network partition has happened and coping with it.

Or maybe we're not thinking in terms of network partitions at all, but we do want to see any exception that might have been raised by the remote side as a result of buggy handling of the cancellation.

And the SSLStream example seems artificial – if my peer is so uncooperative they aren't even draining their receive buffer then how is waiting 5 seconds going to help? better to throw this connection away and get a better one!

I was using the same grace period for all the examples, but this example probably makes more sense with a tiny one -- do we really not want to wait even 100 ms for the send queue to drain in this case?

If I hand you some random third-party library, like I dunno, a complex HTTP app running under hypercorn, then what do you expect it to do if you send a "type 2 soft cancel"? what kind of guarantees would you expect it to give, that you could rely on, and file bugs if they weren't met?

I would expect it to document its behavior in the soft-cancelled state, like "existing requests will be allowed to complete but we'll stop accepting new requests and we'll send CLOSE on all websocket connections once we're done sending queued messages", or whatever. I wouldn't use soft-cancel with a library that didn't define semantics for it, unless I was OK with getting hard-cancel instead. But I also wouldn't use "type 1" soft cancel with a library that didn't define semantics for it, because if it's oblivious then I'm just sleeping for a while before doing a hard cancel, and that's a waste of time.

Some more examples of "type 2 soft cancel" being useful, off the top of my head:

I have an SSL connection with some internal service over the public Internet and we're paranoid about security. If I just break the connection, it will hit a log file somewhere and someone will spend time looking into it to rule out the possibility of malicious interference. If I send a CLOSE_NOTIFY, I can save a human some time.
I'm talking to a financial institution using FIX, and they will call me on the phone and scold me about my buggy software if I don't send them the proper logout message and wait for their reply before I close the connection. Even if I'm closing the connection because they didn't respond to some request I sent in a reasonable amount of time. (Yes, this really happens.)
I'm implementing a "resource server": it knows about various pieces of hardware that my clients might want to run tests on, and when one of them asks me for a place to run their frobnicator, I can tell them "OK, you get WhizBang3" and it will go start mucking with that piece of hardware according to its needs. If I'm shutting down the resource server, or kicking off this client because somebody more important needs WhizBang3, or taking down WhizBang3 for maintenance, it's pretty natural to represent those things as cancellations as various levels... but I would really like to turn that into a request to the client to remove its frobnicator from WhizBang3, which response I will wait some time for, so that it's immediately in a state where I can hand it off to the next person who wants to run a frobnicator, if applicable.

Feb 11 '19 12:02 oremanj

Also, it's worth mentioning that I haven't personally encountered any cases where I needed to set both a deadline and a grace period, but I think that's largely because I'm not using deadlines very much at all in the stuff I've been writing - it's all on reliable internal networks, and cancellation generally occurs on demand. But where I did have occasion to set a grace period, I wanted "not soft shielded" to be the default.

Feb 11 '19 14:02 oremanj

trio trio copied to clipboard

More ergonomic support for graceful shutdown

Safely stopping a server

Properly cleaning up resources upon cancellation

Idea: cancel scopes have grace periods

trio
trio copied to clipboard