zig `cancelawait` keyword to abort an async function call

I've spent many hours in the past trying to solve this, and never quite tied up all the loose ends, but I think I've done it this time.

Related Proposals:

add "select" syntax to the language to await the first function in a given set that completes #5263
a way to check async frame liveness #3164
ability to annotate functions which allocate resources, with a way to deallocate the returned resources #782

Problem 1: Error Handling & Resource Management

Typical async await usage when multiple async functions are "in-flight", written naively, looks like this:

fn asyncAwaitTypicalUsage(allocator: *Allocator) !void {
    var download_frame = async fetchUrl(allocator, "https://example.com/");
    var file_frame = async readFile(allocator, "something.txt");

    const download_text = try await download_frame; // NO GOOD!!!
    defer allocator.free(download_text);

    const file_text = try await file_frame;
    defer allocator.free(file_text);
}

Spot the problem? If the first try returns an error, the in-flight file_frame becomes invalid memory while the readFile function is still using the memory. This is nasty undefined behavior. It's too easy to do this on accident.

Problem 2: The Await Result Location

Function calls directly write their return values into the result locations. This is important for pinned memory, and will become more noticeable when these are implemented:

result location: ability to refer to the return result location before the return statement #2765
result locations: unwrap optional and error unions so that the payload can be non-copied #2761
Ability to mark a struct field as "pinned" in memory #3803
make aggregate types non-copyable by default; provide copyable attribute #3804

However this breaks when using async and await. It is possible to use the advanced builtin @asyncCall and pass a result location pointer to async, but there is not a way to do it with await. The duality is messy, and a function that relies on pinning its return value will have its guarantees broken when it becomes an async function.

Solution

I've tried a bunch of other ideas before, but nothing could quite give us good enough semantics. But now I've got something that solves both problems. The key insight was making obtaining a result location pointer for the return statement of an async function, implicitly a suspend point. This suspends the async function at the return statement, to be resumed by the await site, which will pass it a result location pointer. The crucial point here is that it also provides a suspension point that can be used for cancelawait to activate. If an async function is cancelled, then it resumes, but instead of returning a value, it runs the errdefer and defer expressions that are in scope. So - async functions will simply have to retain the property that idiomatic code already has, which is that all the cleanup that possibly needs to be done is in scope in a defer at a return statement.

I think this is the best of both worlds, between automatically running a function up to the first suspend point, and what e.g. Rust does, not running a function until await is called. A function can introduce an intentional copy of the result data, if it wishes to run the logic in the return expression before an await result pointer is available. It means async function frames can get smaller, because they no longer need the return value in the frame.

Now this leaves the problem of blocking functions which are used with async/await, and what cancelawait does to them. The proposal #782 is open for that purpose, but it has a lot of flaws. Again, here, the key insight of await working properly with result location pointers was the answer. If we move the function call of non-suspending functions used with async/await to happen at the await site instead of the async site, then cancelawait becomes a no-op. async will simply copy the parameters into the frame, and await would do the actual function call. Note that function parameters must be copied anyway for all function calls, so this comes at no penalty, and in fact should be better all around because we don't have "undoing" of allocated resources but we have simply not doing extra work in the first place.

Example code:

fn asyncAwaitTypicalUsage(allocator: *Allocator) !void {
    var download_frame = async fetchUrl(allocator, "https://example.com/");
    errdefer cancelawait download_frame;

    var file_frame = async readFile(allocator, "something.txt");
    errdefer cancelawait file_frame;

    const download_text = try await download_frame;
    defer allocator.free(download_text);

    const file_text = try await file_frame;
    defer allocator.free(file_text);
}

Now, calling an async function looks like any resource allocation that needs to be cleaned up when returning an error. It works like await in that it is a suspend point, however, it discards the return value, and it atomically sets a flag in the function's frame which is observable from within.

Cancellation tokens and propagating whether an async function has been cancelled I think can be out of scope of this proposal. It's possible to build higher level cancellation abstractions on top of this primitive. For example, https://github.com/ziglang/zig/issues/5263#issuecomment-624880004 could be improved with the availability of cancelawait. But more importantly, cancelawait makes it possible to casually use async/await on arbitrary functions in a maintainable and correct way.

Jul 23 '20 07:07 andrewrk

I really don't like the idea of an implicit suspend point. It smells an awful lot like hidden control flow. Perhaps we should require async functions to retrieve their result location explicitly? Wait, no, then that's function colouring. Hmm.

(Also, is there a specific reason that the keyword can't just be cancel? cancelawait is a bit unwieldy.)

Jul 23 '20 12:07 ghost

I really don't like the idea of an implicit suspend point.

I should clarify, there is already a suspend point at a return statement in an async function. Also the fact that it is at return makes it explicit I suppose. Anyway, the point is this doesn't add a suspend point, it moves it a little bit earlier so that the return expression will have the await result pointer before being evaluated, and so that the defers have not been executed yet.

Jul 23 '20 17:07 andrewrk

Sounds like a nice proposal, with moving execution into await instead of async for non-suspend functions being quite the change. Was left with a few questions after reading it over:

What and how would cancellation look like for normal calls to async functions? (e.g. _ = someAsyncFn()). Does it introduce an implicit try, do catch unreachable, etc.?

Also, is execution deferred until await instead of async for functions that dont suspend based on compile time analysis, or is this change a global property? If the latter, does this mean that async no longer runs until the first suspend point? That sounds like it would remove the ability to start async functions concurrently, which is why I feel like i'm misunderstanding it here.

Jul 23 '20 18:07 kprotty

What and how would cancellation look like for normal calls to async functions?

No cancellation possible for these. The result location and the awaiter resume handle are both available from the very beginning of the call. When it gets to return, no suspend occurs; it writes the return value to the result location, runs the (non-error) defers, and then tail-resumes the callee.

Also, is execution deferred until await instead of async for functions that dont suspend based on compile time analysis

At the end of the compilation process, every function is assigned a calling convention. Async functions have an async calling convention. So the compiler does have to "color" functions internally for code generation purposes. So it's based on compile time analysis. (That's status quo already)

Jul 23 '20 18:07 andrewrk

For the last part, as I understand it now, doing var frame = async someAsyncFn() runs someAsyncFn() up until its suspend point if any. If the result location is already available at the beginning of the async fn, does that mean that the execution of someAsyncFn() now begins at its frame's await point (since that were the result location is specified)?

The reason I find this significant is because, if that is true, then it changes the current assumptions on what the async keyword currently does. It would now just "setup the frame", instead of "setup the frame and run until first suspend". If the "run" step is now only possible at await, what does this mean for trying to run other code while an async function is suspended? Originally, after the call to async someAsyncFn(), the async fn would then be running concurrently. Now that only await can start running the async fn, there no longer seems to be a way to express concurrency given await effectively serializes the async procedure.

Jul 23 '20 18:07 kprotty

First, I want to note that result location semantics can already be (and may already be) supported for calls to async functions that do not use the async keyword. This gives us the rule: "the async keyword does not support result location semantics". Any call that does not use the async keyword can retain result location semantics, which means that two-coloring is not a problem. I think this rule is fine. It's simple, easy to explain and understand, and easy to see in code. I also think that passing the result location into @asyncCall is a decent solution for cases where the async keyword and result locations are both required. If you're returning a large value from an async function, something is going to be slow. Our choice is whether to make that slowness obvious (copying the value a couple times) or hidden (performing indirect jumps to do computation at the await site which involves writing large amounts of memory).

That said, I see two fatal inconsistencies between blocking and async functions with this proposal. I think they are much more subtle and hard to catch than problems with result location semantics, so IMO it would be better for the language not to support result location semantics for async calls than to take on these new problems. These two examples are related but subtly different. Fixing one will not fix the other.

`cancelawait` with side effects

If a function has side effects, this definition of cancelawait behaves very differently for async functions vs blocking functions. With async functions, the side effects will have happened when the cancelawait completes. But with this definition for blocking functions, it will not have triggered. This is especially problematic if the side effect is to free memory, as in this example:

// x is consumed by b.
fn b(x: *Thing) void {
    defer Thing.free(x);
    // do other stuff
}

fn a() void {
    var x: *Thing = Thing.alloc();
    
    // ownership of x is passed to b, it will clean up
    var frame = async b(x);

    // if b is async, this will clean up x.
    // if b is blocking, this will not clean up x.
    errdefer cancelawait frame;
    
    // ...
    
    await frame;
}

Return statements with side effects

This proposal can cause undesirable behavior when nested. Consider this async function:

pub fn fetchUrl(allocator: *Allocator, url: []const u8) callconv(.Async) !FetchResult {
    const urlInfo = nosuspend parseUrl(url);
    return try fetchUrlInternal(allocator, urlInfo);
}

Assume for a moment that fetchUrlInternal is blocking. According to the semantics above, it cannot run until the function is awaited, because if the function is cancelawaited its side effects will not happen. For consistency, this rule should also hold for async functions.

But that means that when fetchUrlInternal is async, the meat of this function cannot begin executing until the await happens. This means that if a user spawns 5 frames and then awaits each of them, each will not begin fetching its url until the previous one has completely finished. Essentially the async code has been "linearized", forced to run in order by this constraint.

The alternative is to allow async function calls in the return expression to begin executing asynchronously, and have the await or cancelawait in the parent be passed on to the child. But this causes a significant semantic difference between blocking and async functions, because side effects in async functions will execute but side effects in blocking functions will not.

The proposal addresses this a bit:

A function can introduce an intentional copy of the result data, if it wishes to run the logic in the return expression before an await result pointer is available.

But this is an extremely subtle difference in code for something so dramatically different in execution. I don't think this is a good idea.

It's not explicitly stated in the proposal, but cancelawait must be allowed on completed async functions. Otherwise the example given is buggy:

fn asyncAwaitTypicalUsage(allocator: *Allocator) !void {
    var download_frame = async fetchUrl(allocator, "https://example.com/");
    errdefer cancelawait download_frame;

    var file_frame = async readFile(allocator, "something.txt");
    errdefer cancelawait file_frame;

    // if this returns error, download_frame is awaited twice
    const download_text = try await download_frame;
    defer allocator.free(download_text);
    
    // if this returns error, download_frame and file_frame are both awaited twice
    const file_text = try await file_frame;
    defer allocator.free(file_text);
}

Fixing this is actually the only useful thing cancelawait does in this example. The calling code already needs to know how to clean up the return value, so that knowledge is not abstracted. And the returned values are slices, which are trivially fast to copy. In fact, this form incurs a significant new performance problem, because the processor now needs to make an indirect jump into the return stubs of fetchUrl and readFile which contain the code to copy the slice into the result location, instead of just copying 16 bytes out of the frame. In theory a sufficiently smart compiler could recognize that the stub is known in this case and inline it, but this is more work that has to happen at every async function in the program, and could have a negative impact on build times and debug performance.

I think this use is important, but it can be accomplished more directly. Here's my counterproposal:

Keep cancelawait, but don't have it run defers or errdefers. For a function that returns T, cancelawait returns ?T. If the function has been awaited or cancelawaited, cancelawait returns null. Otherwise it returns the return value.

This would allow the above example to be written as follows:

fn asyncAwaitTypicalUsage(allocator: *Allocator) !void {
    var download_frame = async fetchUrl(allocator, "https://example.com/");
    errdefer if (cancelawait download_frame) |text| allocator.free(text);

    var file_frame = async readFile(allocator, "something.txt");
    errdefer if (cancelawait file_frame) |text| allocator.free(text);

    const download_text = try await download_frame;
    defer allocator.free(download_text);
    
    const file_text = try await file_frame;
    defer allocator.free(file_text);
}

This is still much less efficient than avoiding defer/errdefer/try and putting the cleanup code at each return statement, because there are now atomic checks that must be made to implement cancelawait. And the optimizer will never be able to get to that level of efficiency, because it can't prove that download_frame will not trigger something that will cause file_frame to be awaited elsewhere and then return an error. But at least the code is a bit cleaner than including bools alongside each frame to prevent double-awaits.

Jul 23 '20 18:07 SpexGuy

For the last part, as I understand it now, doing var frame = async someAsyncFn() runs someAsyncFn() up until its suspend point if any. If the result location is already available at the beginning of the async fn, does that mean that the execution of someAsyncFn() now begins at its frame's await point (since that were the result location is specified)?

In the exmaple var frame = async someAsyncFn() the result location for return is not available yet, not until await happens. However, it would still setup the frame and run until first suspend, just like status quo. Here's an example that highlights the difference between status quo and this proposal:

This Proposal

fn main() void {
    seq('a');
    var frame1 = async foo();
    seq('c');
    var frame2 = async bar();
    seq('e');
    const x = await frame1;
    seq('k');
    const y = await frame2;
    seq('m');
}

fn foo() i32 {
    defer seq('j');
    seq('b');
    operationThatSuspends();
    seq('f');
    return util();
}

fn util() i32 {
    seq('g');
    operationThatSuspends();
    seq('i');
    return 1234;
}

fn bar() i32 {
    defer seq('l');
    seq('d');
    operationThatSuspends();
    seq('h');
    return 1234;
}

Jul 23 '20 18:07 andrewrk

it would still setup the frame and run until first suspend, just like status quo

Ah ok, think that was where my misunderstanding was. My last point of confusion was related to how non-suspending async fns are handled:

If we move the function call of non-suspending functions used with async/await to happen at the await site instead of the async site

Is this change in semantics something applied by compile time analysis or through some other observation? If its compile time defined, what happens to the result values of async f() started functions before they're await'ed which conditionally suspend at runtime? Running until suspend at async would discard the result value as theres not yet a provided result location. Running at await would serialize the async function as explained eariler.

Jul 23 '20 18:07 kprotty

Instead of a new keyword, why couldn't a frame just have a cancel function?

errdefer download_task.cancel();

Jul 29 '20 16:07 frmdstryr

@frmdstryr Nice idea. Would it make sense to extend this to other frame functionality? suspend probably wouldn't be feasible to be a frame method since it needs to support block execution.

download_task.resume();

download_task.await();

EDIT: removed async since its a calling convention and is invoked on the function rather than the frame

Jul 29 '20 20:07 kprotty

These aren't methods though -- they're built-in functionality. Writing them as methods is misleading, and breaks the principle of all control flow as keywords.

Jul 30 '20 00:07 ghost

@EleanorNB All control flow isn't currently keywords as function calls themselves are a form of control flow and can have control flow inside them as well. If I understand correctly, resume currently updates some atomic state and tail-calls into the frame's func, while await updates some atomic state and possibly suspends. Given both don't require source level control like async/suspend do, them being methods instead of keywords seems to be pretty fitting. One example of this is Rust where await is a field-property keyword of Futures/Frames and the resume equivalent is a poll() method on the Future/Frame as well.

Jul 30 '20 12:07 kprotty

Thought: if cancel runs errdefers, and errdefers can capture values, then cancel will also need to take an error to propagate up the function. How would we specify that? We could just do cancel frame, error.Something, but there's no precedent in the language for bare comma-separated lists... we could make cancel a builtin rather than a keyword, but that breaks symmetry with the rest of async machinery... hmm.

Nov 27 '20 18:11 ghost

Another option to maybe consider: suspend could now return an error.Cancelled, then cancel frame resumes the frame while making the suspend return that error. One would handle and possibly return that error after noticing a cancelled suspend which would then bubble up the normal expected route of running errdefer and such

Nov 27 '20 19:11 kprotty

No good -- not all suspend points are marked with suspend. Then we have to mark every direct async function call and return statement with an error, or return an error union from every async function -- that's function colouring, all over again.

Nov 28 '20 05:11 ghost

I was under the assumption that there are only two ways to introduce a suspend point: suspend and await.

The former could return the error as noted earlier, and to mimic current semantics would be to ignore the error: suspend { ... } catch unreachable. This effectively means that the frame cannot handle cancellation at that suspension point.

The latter AFAICK has two choices:

keep current semantics by ignoring a cancellation error (see above)
have await return an error union with the frame's return type (along with nosuspend catching another error). You could also ignore the error here via catch unreachable in order to keep current semantics.

In both cases, the marking is at the suspension point rather than at return or async invocation.

Nov 30 '20 17:11 kprotty

A blocking async function call is an implicit await, so it also counts as a suspend point. For example:

fn foo() u32 {
    var x: u32 = 4;
    callThatMaySuspend(); // x must be saved to the frame, this call is a suspend point
    // equivalent to `await async callThatMaySuspend();`
    return x;
}

For cancellation to work, any function that may suspend or await (and supports cancellation) needs to return an error union which includes cancelled. This is the "function colouring, all over again" that Eleanor is describing.

Nov 30 '20 17:11 SpexGuy

Hm, forgot about compiler inserted awaits. The first bullet point sounds like the way to go there (the compiler adding catch {} to the inserted await's suspend point) which makes await ignore cancellations.

At first glance, this makes sense as code which expects a result (e.g. using await) isn't written in a way to handle cancellation. You would then only be able to meaningfully cancel frames which are at suspends that explicitly support/handle cancellation (e.g. suspended in a async socket/channel which has more suspend control), while cancel frame on those that dont simply have no effect. Is there a hole im missing here though?

Nov 30 '20 21:11 kprotty

Implicit catch {} or catch unreachable is a horrible idea. Explicit catch is not much better.

Since we want to localise any explicitly async behaviour to the callsite, I do believe it's cancel that has to specify the error. Since we don't actually use the returned error, I think it's ok not to include it in the function signature.

Dec 01 '20 02:12 ghost

In line with #5277, this should be consistent if we only allow cancel on awaitable handles (anyframe->T, *@Frame(...)).

Dec 01 '20 02:12 ghost

@EleanorNB why would implicit catch {} be a bad idea? I feel like running defers/errdefers on cancellation without any explicit returns or scope ending sounds much more error prone.

Dec 01 '20 22:12 kprotty

Discarding all errors from an operation, only if the enclosing function happens to be async, which is nowhere explicitly marked? No thankyou.

In my eyes, the cancel keyword is the explication of scope end. Yes, it's at the caller, which is unfortunate -- however, cancellation is literally an externally-mandated exit; this is the price we pay for having it at all.

Dec 02 '20 07:12 ghost

@EleanorNB

Discarding all errors from an operation

I think there was a miss-comm. on my part. The await would return something like error{Cancelled}!ReturnType instead where ReturnType could be whatever like error{Overflow}!T for example (making it error{Cancelled}!(error{Overflow}!T)). Im not actually sure if you can nest error sets like that but that was what I was implying. Given that, the catch {} would only apply for the cancel error, meaning it would ignore a cancel frame request and act similarly to a nosuspend on resume (by asserting that there is a runtime value from the awaited frame and that it wasnt cancelled).

In my eyes, the cancel keyword is the explication of scope end.

Was under the assumption that cancel frame would run the defers inside the frame rather than inside the caller. If it did so for the caller then that sounds like only the current frame can cancel itself, which sounds more limiting than I imagined from the original proposal.

Dec 02 '20 14:12 kprotty

To my knowledge, nesting error sets is impossible. Even if it weren't, in my eyes it should be. That way lies madness.

cancel does run the defers inside the callee frame, not the caller frame, and I never proposed it should be otherwise. In the very next sentence I expressed my disappointment that it had to be separated from the scope in which it had an effect. However, this is the price we pay for cancelable functions -- cancellation needs to be possible at any suspend point for consistency, and while it may be possible (but very cumbersome) to mark every suspend and await, marking every point is flatly impossible. Thus, any function call may be an implicit exit point, and the programmer must be prepared for that. It's at least bearable, since every function call looks like one, but it's unfortunate.

Dec 02 '20 16:12 ghost

To my knowledge, nesting error sets is impossible

The "is cancelled" state can then be switched to a bit in the frame state instead of a error set provided at await. await itself can simply not be cancellable (panicking when it observes the bit to be set).

cancellation needs to be possible at any suspend point for consistency,

The issue with this is that: how would it behave for operations that aren't cancellable or that wish to perform asynchronous cancellation? Detecting a cancel at the suspend point gives those operations a chance to see and reject a cancellation request (if they cannot support it). A good example of this is completion based IO via io_uring where some IO operations on certain file descriptors just cannot be cancelled even when you send a IORING_OP_ASYNC_CANCEL so you either have to block or heap allocate.

marking every point is flatly impossible

Only suspends would need to be marked here, not awaits. Under that scope, thats pretty reasonable considering suspends happen internally for data structures which talk with frames directly and can be abstracted upon.

any function call may be an implicit exit point

I actually think this, in a weird way, reintroduces colored functions as defers could be executed at different times depending on whether the function is synchronous or async..

Dec 02 '20 22:12 kprotty

all errors from an operation

Does zig have something akin to AggregateError?

Dec 02 '20 23:12 Mouvedia

await itself can simply not be cancellable

Then whether a frame is cancelable or not depends on its current suspend point, which otherwise is completely invisible and unpredictable to the caller. What you get then is people saying that for safety, you should never try to cancel a frame. That's a C problem; Zig is better than that.

how would it behave for operations that aren't cancellable

This would typically be known by the programmer, so we would trust them not to attempt this. In such functions, the errdefers should clean up the state anyway, and if that has to involve blocking, then so be it. (That might mean cancel could itself be a suspend point, but I don't think this is necessarily a problem -- we have nosuspend, after all.)

a chance to see and reject a cancellation request

It's not a request. We don't ask nicely. When we say cancel, we mean cancel, not "if you'd be so kind as to cancel".

Only suspends would need to be marked here

An await or blocking call is still a suspend point. Under your model, if we cancel an awaiting frame, the defers in the awaited frame would run, but not in the cancelled frame. (Unless we have some idea of an error set reserved for cancellation, that does not function as an ordinary error set -- because, if a blocking call is not to a coroutine, then your nested error set idea reduces to a single error set, and there's no way to distinguish that from an ordinary returned error.)

defers could be executed at different times

The semantics of defer don't change -- any exit point runs the defers above it, sync or async.

I actually think this, in a weird way, reintroduces colored functions

There is always going to be some semantic difference between synchronous and asynchronous code. That's the whole point. However, the programmer's model doesn't change, and no code needs to be rewritten -- we're still colourblind. Under your proposal, colouring would be a lot worse: asynchronous calls have to have special second error set, synchronous calls * cannot* have that lest it be confused with an ordinary error set.

Dec 03 '20 06:12 ghost

What you get then is people saying that for safety, you should never try to cancel a frame.

I don't really follow. resume depends on the state of the frame (is "invisible and unpredictable to the caller") and will panic if its completed or being resumed by another thread (even in ReleaseFast it seems). People aren't saying "you should never try to resume a frame". Almost all async keywords/operations excluding suspend imply that you are aware of the state of the frame without any explicit notion in code, so I think this type of cancellation is still valuable.

In such functions, the errdefers should clean up the state anyway, and if that has to involve blocking, then so be it.

This has actually been a pain point in Rust futures as well. It requires implementing cancellation at the destructor of the Future/Frame but that is only synchronous. People want asynchronous cancellation (e.g. AsyncDrop) but that wouldn't fit well into the ecosystem so they resort to heap allocating the async resources that cannot be synchronously cancelled in a non-blocking manner so that it outlives the async context to be cancelled in the future.

The latter of not heap-allocating, which is blocking on cancellation, can actually be both an inefficiency + logic error:

You monopolize a worker thread in a multi-threaded event loop where other tasks could have been running while you're waiting for your resource to complete to free it in a non-deterministic amount of time.
If the resource can only be satisfied at the event loop scheduling points (e.g. from a suspend) then all worker threads could block waiting for the resource to complete without letting it, producing a deadlock.

It's not a request. We don't ask nicely. When we say cancel, we mean cancel,

Again, not everything can be cancelled. So you end up introducing runtime overhead as stated above in order to accommodate a language semantic. It would be great if we don't end like rust in that regard as its sacrificing customizability for simplicity without a way to opt-out as its at the lang level.

Under your model, if we cancel an awaiting frame, the defers in the awaited frame would run, but not in the cancelled frame.

I think there has been another misunderstanding. My idea of cancellation doesn't include defers or how to run them any differently. It only introduces cancel frame and suspend { .. } catch |err| { ... }. Cancelling an awaiting frame would either cause a panic to the cancelling frame or the awaiting frame.

The latter was what I was suggesting before. Here, await wouldn't introduce a magical new error to the return type. The cancellation state would be handled internally; Await inserts an implicit suspend point when the frame result isn't ready. This internal one would just go from suspend { ... } to suspend { ... } catch panic("await not cancellable").

The former is also an option (that I just thought of), which could be made more forgiving by cancel frame returning an error if it succeeded in cancelling the frame or not. This moves the decision of "is this cancellation" from the suspend point to the effective resume point. Im not too big of a fan of this approach as it tries to make Zig async/await more readiness based instead of completion base which goes against its original model and introduces a mandatory synchronization overhead to resume points that, atm, could be removed in the future.

The semantics of defer don't change -- any exit point runs the defers above it, sync or async.

The issue here is that suspend + normal function calls that aren't at the end of the scope or use try are now exit points. This makes using defer trickier as its no longer explicit where an exit point really is in sync vs. async. In async, your defer/errdefer could run earlier than it possibly ever could in sync if a middle function suspended and was cancelled..

Under your proposal, colouring would be a lot worse: asynchronous calls have to have special second error set,

Again, this is not the case. await would handle the cancelled error/state internally.

Dec 03 '20 11:12 kprotty

Without even looking at the called function, standard coding practice is enough to ensure exactly one suspension is paired with one resumption, and one invocation with one completion -- so, if the programmer has done their job well, they should not encounter language-enforced crashes. However, there is no way of inspecting the internal suspension state of a function, so the invoker can't know whether it's suspended directly or awaiting. Thus, any attempt at cancellation, no matter how careful the programmer, has a possibility of crashing the program. (Even worse, the common pattern of calling a function to register the frame with the event loop is guaranteed to crash.) Call me crazy, but if the programmer has done their due diligence, they shouldn't have to worry about language-enforced crashes.

As you've pointed out though, my model (actually Andrew's model as well in the relevant places) isn't perfect either -- cancellation would then itself be an asynchronous process, which means it would need its own frame, and that frame would itself need to be cancelable, and how the hell would that work? It seems to me that no implementation of cancellation can ever be guaranteed to succeed, which in my eyes contradicts point 11b of the Zen.

In light of this, @andrewrk, I don't believe that cancellation should be implemented at the language level. We may provide a cancel token implementation in the standard library (which is a much better and more flexible solution anyway), but async frames themselves must be awaited to complete. I do believe however that the proposed asynchronous RLS is a worthwhile idea.

Dec 03 '20 12:12 ghost

We may implement one language-level feature to make userspace cancellation easier: rather than anyframe, a resumable handle could have type anyframe<-T -- that is, suspend has a value, and resume takes a value of that type to pass to the function, indicating a procession or cancellation:

// In the suspending function
const action = suspend {
    event_loop.registerContinuationAndCancellation(@frame(), continuation_condition, cancellation_condition);
};

switch (action) {
    .go => {},
    .stop => return error.functionXCancelled;
}


// In the event loop (some details missing)
if (frame.continuation and @atomicRmw(bool, &frame.suspended, .Xchg, false, .Weak) {
    resume frame.ptr, .go;
    frame.* = null;
}

if (frame.cancellation and @atomicRmw(bool, &frame.suspended, .Xchg, false, .Weak) {
    resume frame.ptr, .stop;
    frame.* = null;
}

Since @frame() may be called anywhere within the function, and the resumer needs to know the type before analysing the frame, the suspend type (T in anyframe<-T) must be part of the function's signature. I propose we reuse while loop continuation syntax:

const suspendingFunction = fn (arg: Arg) ReturnType : ContinuationType {
    // ...
};

Any function that uses the suspend keyword must have a suspend type. This is not function colouring, as any function with explicit suspend is necessarily asynchronous anyway (functions that only await cannot be keyword-resumed, so do not need a suspend type). The suspend type may be void or error!void (no error set inference), in which case the handle type is anyframe<-void or anyframe<-error!void (not anyframe -- we require strongly typed handles for type checking, which is one drawback), and resume does not necessarily take a second argument, as in status quo.

This not only permits flexible evented userspace cancellation, but also more specialised continuation conditions: a function waiting for multiple files to become available could receive a handle to the first one that does, and combined with a mechanism to check whether a frame has completed, #5263 could be implemented in userspace in the same manner.

At first blush, this may appear to be hostile to inlining async functions -- however, allowing that would already require semantic changes (#5277) that actually complement this quite nicely: @frame() would return anyframe<-T of the syntactically enclosing function's suspend type, regardless of the suspend type of the underlying frame, and there is now a strict delineation between resumable and awaitable handles.

This is, of course, a separate proposal -- I'll write up a proper one later.

Dec 03 '20 14:12 ghost

zig zig copied to clipboard

`cancelawait` keyword to abort an async function call

Related Proposals:

Problem 1: Error Handling & Resource Management

Problem 2: The Await Result Location

Solution

cancelawait with side effects

Return statements with side effects

This Proposal

zig
zig copied to clipboard

`cancelawait` with side effects