zig
zig copied to clipboard
`cancelawait` keyword to abort an async function call
I've spent many hours in the past trying to solve this, and never quite tied up all the loose ends, but I think I've done it this time.
Related Proposals:
- add "select" syntax to the language to await the first function in a given set that completes #5263
- a way to check async frame liveness #3164
- ability to annotate functions which allocate resources, with a way to deallocate the returned resources #782
Problem 1: Error Handling & Resource Management
Typical async await usage when multiple async functions are "in-flight", written naively, looks like this:
fn asyncAwaitTypicalUsage(allocator: *Allocator) !void {
var download_frame = async fetchUrl(allocator, "https://example.com/");
var file_frame = async readFile(allocator, "something.txt");
const download_text = try await download_frame; // NO GOOD!!!
defer allocator.free(download_text);
const file_text = try await file_frame;
defer allocator.free(file_text);
}
Spot the problem? If the first try
returns an error, the in-flight file_frame
becomes invalid memory while the readFile
function is still using the memory. This is nasty undefined behavior. It's too easy to do this on accident.
Problem 2: The Await Result Location
Function calls directly write their return values into the result locations. This is important for pinned memory, and will become more noticeable when these are implemented:
- result location: ability to refer to the return result location before the
return
statement #2765 - result locations: unwrap optional and error unions so that the payload can be non-copied #2761
- Ability to mark a struct field as "pinned" in memory #3803
- make aggregate types non-copyable by default; provide copyable attribute #3804
However this breaks when using async
and await
. It is possible to use the advanced builtin @asyncCall
and pass a result location pointer to async
, but there is not a way to do it with await
. The duality is messy, and a function that relies on pinning its return value will have its guarantees broken when it becomes an async function.
Solution
I've tried a bunch of other ideas before, but nothing could quite give us good enough semantics. But now I've got something that solves both problems. The key insight was making obtaining a result location pointer for the return
statement of an async
function, implicitly a suspend point. This suspends the async function at the return
statement, to be resumed by the await
site, which will pass it a result location pointer. The crucial point here is that it also provides a suspension point that can be used for cancelawait
to activate. If an async function is cancelled, then it resumes, but instead of returning a value, it runs the errdefer
and defer
expressions that are in scope. So - async functions will simply have to retain the property that idiomatic code already has, which is that all the cleanup that possibly needs to be done is in scope in a defer at a return
statement.
I think this is the best of both worlds, between automatically running a function up to the first suspend point, and what e.g. Rust does, not running a function until await
is called. A function can introduce an intentional copy of the result data, if it wishes to run the logic in the return expression before an await
result pointer is available. It means async function frames can get smaller, because they no longer need the return value in the frame.
Now this leaves the problem of blocking functions which are used with async
/await
, and what cancelawait
does to them. The proposal #782 is open for that purpose, but it has a lot of flaws. Again, here, the key insight of await
working properly with result location pointers was the answer. If we move the function call of non-suspending functions used with async/await to happen at the await site instead of the async site, then cancelawait
becomes a no-op. async
will simply copy the parameters into the frame, and await
would do the actual function call. Note that function parameters must be copied anyway for all function calls, so this comes at no penalty, and in fact should be better all around because we don't have "undoing" of allocated resources but we have simply not doing extra work in the first place.
Example code:
fn asyncAwaitTypicalUsage(allocator: *Allocator) !void {
var download_frame = async fetchUrl(allocator, "https://example.com/");
errdefer cancelawait download_frame;
var file_frame = async readFile(allocator, "something.txt");
errdefer cancelawait file_frame;
const download_text = try await download_frame;
defer allocator.free(download_text);
const file_text = try await file_frame;
defer allocator.free(file_text);
}
Now, calling an async function looks like any resource allocation that needs to be cleaned up when returning an error. It works like await
in that it is a suspend point, however, it discards the return value, and it atomically sets a flag in the function's frame which is observable from within.
Cancellation tokens and propagating whether an async function has been cancelled I think can be out of scope of this proposal. It's possible to build higher level cancellation abstractions on top of this primitive. For example, https://github.com/ziglang/zig/issues/5263#issuecomment-624880004 could be improved with the availability of cancelawait
. But more importantly, cancelawait
makes it possible to casually use async
/await
on arbitrary functions in a maintainable and correct way.
I really don't like the idea of an implicit suspend point. It smells an awful lot like hidden control flow. Perhaps we should require async functions to retrieve their result location explicitly? Wait, no, then that's function colouring. Hmm.
(Also, is there a specific reason that the keyword can't just be cancel
? cancelawait
is a bit unwieldy.)
I really don't like the idea of an implicit suspend point.
I should clarify, there is already a suspend point at a return
statement in an async function. Also the fact that it is at return
makes it explicit I suppose. Anyway, the point is this doesn't add a suspend point, it moves it a little bit earlier so that the return expression will have the await result pointer before being evaluated, and so that the defers have not been executed yet.
Sounds like a nice proposal, with moving execution into await instead of async for non-suspend functions being quite the change. Was left with a few questions after reading it over:
What and how would cancellation look like for normal calls to async functions? (e.g. _ = someAsyncFn()
). Does it introduce an implicit try
, do catch unreachable
, etc.?
Also, is execution deferred until await
instead of async
for functions that dont suspend based on compile time analysis, or is this change a global property? If the latter, does this mean that async
no longer runs until the first suspend point? That sounds like it would remove the ability to start async functions concurrently, which is why I feel like i'm misunderstanding it here.
What and how would cancellation look like for normal calls to async functions?
No cancellation possible for these. The result location and the awaiter resume handle are both available from the very beginning of the call. When it gets to return
, no suspend occurs; it writes the return value to the result location, runs the (non-error) defers, and then tail-resumes the callee.
Also, is execution deferred until
await
instead ofasync
for functions that dont suspend based on compile time analysis
At the end of the compilation process, every function is assigned a calling convention. Async functions have an async calling convention. So the compiler does have to "color" functions internally for code generation purposes. So it's based on compile time analysis. (That's status quo already)
For the last part, as I understand it now, doing var frame = async someAsyncFn()
runs someAsyncFn() up until its suspend point if any. If the result location is already available at the beginning of the async fn, does that mean that the execution of someAsyncFn() now begins at its frame
's await
point (since that were the result location is specified)?
The reason I find this significant is because, if that is true, then it changes the current assumptions on what the async
keyword currently does. It would now just "setup the frame", instead of "setup the frame and run until first suspend".
If the "run" step is now only possible at await
, what does this mean for trying to run other code while an async function is suspended? Originally, after the call to async someAsyncFn()
, the async fn would then be running concurrently. Now that only await
can start running the async fn, there no longer seems to be a way to express concurrency given await
effectively serializes the async procedure.
First, I want to note that result location semantics can already be (and may already be) supported for calls to async functions that do not use the async
keyword. This gives us the rule: "the async
keyword does not support result location semantics". Any call that does not use the async
keyword can retain result location semantics, which means that two-coloring is not a problem. I think this rule is fine. It's simple, easy to explain and understand, and easy to see in code. I also think that passing the result location into @asyncCall
is a decent solution for cases where the async keyword and result locations are both required. If you're returning a large value from an async function, something is going to be slow. Our choice is whether to make that slowness obvious (copying the value a couple times) or hidden (performing indirect jumps to do computation at the await site which involves writing large amounts of memory).
That said, I see two fatal inconsistencies between blocking and async functions with this proposal. I think they are much more subtle and hard to catch than problems with result location semantics, so IMO it would be better for the language not to support result location semantics for async calls than to take on these new problems. These two examples are related but subtly different. Fixing one will not fix the other.
cancelawait
with side effects
If a function has side effects, this definition of cancelawait
behaves very differently for async functions vs blocking functions. With async functions, the side effects will have happened when the cancelawait completes. But with this definition for blocking functions, it will not have triggered. This is especially problematic if the side effect is to free memory, as in this example:
// x is consumed by b.
fn b(x: *Thing) void {
defer Thing.free(x);
// do other stuff
}
fn a() void {
var x: *Thing = Thing.alloc();
// ownership of x is passed to b, it will clean up
var frame = async b(x);
// if b is async, this will clean up x.
// if b is blocking, this will not clean up x.
errdefer cancelawait frame;
// ...
await frame;
}
Return statements with side effects
This proposal can cause undesirable behavior when nested. Consider this async function:
pub fn fetchUrl(allocator: *Allocator, url: []const u8) callconv(.Async) !FetchResult {
const urlInfo = nosuspend parseUrl(url);
return try fetchUrlInternal(allocator, urlInfo);
}
Assume for a moment that fetchUrlInternal
is blocking. According to the semantics above, it cannot run until the function is await
ed, because if the function is cancelawait
ed its side effects will not happen. For consistency, this rule should also hold for async
functions.
But that means that when fetchUrlInternal
is async, the meat of this function cannot begin executing until the await
happens. This means that if a user spawns 5 frames and then awaits each of them, each will not begin fetching its url until the previous one has completely finished. Essentially the async code has been "linearized", forced to run in order by this constraint.
The alternative is to allow async function calls in the return expression to begin executing asynchronously, and have the await
or cancelawait
in the parent be passed on to the child. But this causes a significant semantic difference between blocking and async functions, because side effects in async functions will execute but side effects in blocking functions will not.
The proposal addresses this a bit:
A function can introduce an intentional copy of the result data, if it wishes to run the logic in the return expression before an await result pointer is available.
But this is an extremely subtle difference in code for something so dramatically different in execution. I don't think this is a good idea.
It's not explicitly stated in the proposal, but cancelawait
must be allowed on completed async functions. Otherwise the example given is buggy:
fn asyncAwaitTypicalUsage(allocator: *Allocator) !void {
var download_frame = async fetchUrl(allocator, "https://example.com/");
errdefer cancelawait download_frame;
var file_frame = async readFile(allocator, "something.txt");
errdefer cancelawait file_frame;
// if this returns error, download_frame is awaited twice
const download_text = try await download_frame;
defer allocator.free(download_text);
// if this returns error, download_frame and file_frame are both awaited twice
const file_text = try await file_frame;
defer allocator.free(file_text);
}
Fixing this is actually the only useful thing cancelawait
does in this example. The calling code already needs to know how to clean up the return value, so that knowledge is not abstracted. And the returned values are slices, which are trivially fast to copy. In fact, this form incurs a significant new performance problem, because the processor now needs to make an indirect jump into the return stubs of fetchUrl
and readFile
which contain the code to copy the slice into the result location, instead of just copying 16 bytes out of the frame. In theory a sufficiently smart compiler could recognize that the stub is known in this case and inline it, but this is more work that has to happen at every async function in the program, and could have a negative impact on build times and debug performance.
I think this use is important, but it can be accomplished more directly. Here's my counterproposal:
Keep cancelawait
, but don't have it run defers or errdefers. For a function that returns T
, cancelawait
returns ?T
. If the function has been await
ed or cancelawait
ed, cancelawait
returns null. Otherwise it returns the return value.
This would allow the above example to be written as follows:
fn asyncAwaitTypicalUsage(allocator: *Allocator) !void {
var download_frame = async fetchUrl(allocator, "https://example.com/");
errdefer if (cancelawait download_frame) |text| allocator.free(text);
var file_frame = async readFile(allocator, "something.txt");
errdefer if (cancelawait file_frame) |text| allocator.free(text);
const download_text = try await download_frame;
defer allocator.free(download_text);
const file_text = try await file_frame;
defer allocator.free(file_text);
}
This is still much less efficient than avoiding defer
/errdefer
/try
and putting the cleanup code at each return statement, because there are now atomic checks that must be made to implement cancelawait
. And the optimizer will never be able to get to that level of efficiency, because it can't prove that download_frame
will not trigger something that will cause file_frame
to be awaited elsewhere and then return an error. But at least the code is a bit cleaner than including bools alongside each frame to prevent double-awaits.
For the last part, as I understand it now, doing
var frame = async someAsyncFn()
runs someAsyncFn() up until its suspend point if any. If the result location is already available at the beginning of the async fn, does that mean that the execution of someAsyncFn() now begins at itsframe
'sawait
point (since that were the result location is specified)?
In the exmaple var frame = async someAsyncFn()
the result location for return
is not available yet, not until await
happens. However, it would still setup the frame and run until first suspend, just like status quo. Here's an example that highlights the difference between status quo and this proposal:
This Proposal
fn main() void {
seq('a');
var frame1 = async foo();
seq('c');
var frame2 = async bar();
seq('e');
const x = await frame1;
seq('k');
const y = await frame2;
seq('m');
}
fn foo() i32 {
defer seq('j');
seq('b');
operationThatSuspends();
seq('f');
return util();
}
fn util() i32 {
seq('g');
operationThatSuspends();
seq('i');
return 1234;
}
fn bar() i32 {
defer seq('l');
seq('d');
operationThatSuspends();
seq('h');
return 1234;
}
it would still setup the frame and run until first suspend, just like status quo
Ah ok, think that was where my misunderstanding was. My last point of confusion was related to how non-suspending async fns are handled:
If we move the function call of non-suspending functions used with async/await to happen at the await site instead of the async site
Is this change in semantics something applied by compile time analysis or through some other observation? If its compile time defined, what happens to the result values of async f()
started functions before they're await'ed which conditionally suspend at runtime? Running until suspend at async
would discard the result value as theres not yet a provided result location. Running at await
would serialize the async function as explained eariler.
Instead of a new keyword, why couldn't a frame just have a cancel
function?
errdefer download_task.cancel();
@frmdstryr Nice idea. Would it make sense to extend this to other frame functionality? suspend
probably wouldn't be feasible to be a frame method since it needs to support block execution.
download_task.resume();
download_task.await();
EDIT: removed async
since its a calling convention and is invoked on the function rather than the frame
These aren't methods though -- they're built-in functionality. Writing them as methods is misleading, and breaks the principle of all control flow as keywords.
@EleanorNB All control flow isn't currently keywords as function calls themselves are a form of control flow and can have control flow inside them as well. If I understand correctly, resume
currently updates some atomic state and tail-calls into the frame's func, while await
updates some atomic state and possibly suspends
. Given both don't require source level control like async/suspend do, them being methods instead of keywords seems to be pretty fitting. One example of this is Rust where await
is a field-property keyword of Futures/Frames and the resume
equivalent is a poll()
method on the Future/Frame as well.
Thought: if cancel
runs errdefer
s, and errdefer
s can capture values, then cancel
will also need to take an error to propagate up the function. How would we specify that? We could just do cancel frame, error.Something
, but there's no precedent in the language for bare comma-separated lists... we could make cancel
a builtin rather than a keyword, but that breaks symmetry with the rest of async machinery... hmm.
Another option to maybe consider: suspend
could now return an error.Cancelled, then cancel frame
resumes the frame while making the suspend return that error. One would handle and possibly return that error after noticing a cancelled suspend which would then bubble up the normal expected route of running errdefer
and such
No good -- not all suspend points are marked with suspend
. Then we have to mark every direct async function call and return
statement with an error, or return an error union from every async function -- that's function colouring, all over again.
I was under the assumption that there are only two ways to introduce a suspend point: suspend
and await
.
The former could return the error as noted earlier, and to mimic current semantics would be to ignore the error: suspend { ... } catch unreachable
. This effectively means that the frame cannot handle cancellation at that suspension point.
The latter AFAICK has two choices:
- keep current semantics by ignoring a cancellation error (see above)
- have
await
return an error union with the frame's return type (along withnosuspend
catching another error). You could also ignore the error here viacatch unreachable
in order to keep current semantics.
In both cases, the marking is at the suspension point rather than at return
or async
invocation.
A blocking async function call is an implicit await
, so it also counts as a suspend point. For example:
fn foo() u32 {
var x: u32 = 4;
callThatMaySuspend(); // x must be saved to the frame, this call is a suspend point
// equivalent to `await async callThatMaySuspend();`
return x;
}
For cancellation to work, any function that may suspend or await (and supports cancellation) needs to return an error union which includes cancelled. This is the "function colouring, all over again" that Eleanor is describing.
Hm, forgot about compiler inserted awaits. The first bullet point sounds like the way to go there (the compiler adding catch {}
to the inserted await's suspend point) which makes await
ignore cancellations.
At first glance, this makes sense as code which expects a result (e.g. using await) isn't written in a way to handle cancellation. You would then only be able to meaningfully cancel frames which are at suspends that explicitly support/handle cancellation (e.g. suspended in a async socket/channel which has more suspend
control), while cancel frame
on those that dont simply have no effect. Is there a hole im missing here though?
Implicit catch {}
or catch unreachable
is a horrible idea. Explicit catch
is not much better.
Since we want to localise any explicitly async
behaviour to the callsite, I do believe it's cancel
that has to specify the error. Since we don't actually use the returned error, I think it's ok not to include it in the function signature.
In line with #5277, this should be consistent if we only allow cancel
on awaitable handles (anyframe->T
, *@Frame(...)
).
@EleanorNB why would implicit catch {}
be a bad idea? I feel like running defers/errdefers on cancellation without any explicit returns or scope ending sounds much more error prone.
Discarding all errors from an operation, only if the enclosing function happens to be async, which is nowhere explicitly marked? No thankyou.
In my eyes, the cancel
keyword is the explication of scope end. Yes, it's at the caller, which is unfortunate -- however, cancellation is literally an externally-mandated exit; this is the price we pay for having it at all.
@EleanorNB
Discarding all errors from an operation
I think there was a miss-comm. on my part. The await would return something like error{Cancelled}!ReturnType
instead where ReturnType could be whatever like error{Overflow}!T
for example (making it error{Cancelled}!(error{Overflow}!T)
). Im not actually sure if you can nest error sets like that but that was what I was implying. Given that, the catch {}
would only apply for the cancel error, meaning it would ignore a cancel frame
request and act similarly to a nosuspend
on resume (by asserting that there is a runtime value from the awaited frame and that it wasnt cancelled).
In my eyes, the cancel keyword is the explication of scope end.
Was under the assumption that cancel frame
would run the defers inside the frame rather than inside the caller. If it did so for the caller then that sounds like only the current frame can cancel itself, which sounds more limiting than I imagined from the original proposal.
To my knowledge, nesting error sets is impossible. Even if it weren't, in my eyes it should be. That way lies madness.
cancel
does run the defers inside the callee frame, not the caller frame, and I never proposed it should be otherwise. In the very next sentence I expressed my disappointment that it had to be separated from the scope in which it had an effect. However, this is the price we pay for cancelable functions -- cancellation needs to be possible at any suspend point for consistency, and while it may be possible (but very cumbersome) to mark every suspend
and await
, marking every point is flatly impossible. Thus, any function call may be an implicit exit point, and the programmer must be prepared for that. It's at least bearable, since every function call looks like one, but it's unfortunate.
To my knowledge, nesting error sets is impossible
The "is cancelled" state can then be switched to a bit in the frame state instead of a error set provided at await. await
itself can simply not be cancellable (panicking when it observes the bit to be set).
cancellation needs to be possible at any suspend point for consistency,
The issue with this is that: how would it behave for operations that aren't cancellable or that wish to perform asynchronous cancellation? Detecting a cancel at the suspend point gives those operations a chance to see and reject a cancellation request (if they cannot support it). A good example of this is completion based IO via io_uring where some IO operations on certain file descriptors just cannot be cancelled even when you send a IORING_OP_ASYNC_CANCEL
so you either have to block or heap allocate.
marking every point is flatly impossible
Only suspend
s would need to be marked here, not await
s. Under that scope, thats pretty reasonable considering suspends happen internally for data structures which talk with frames directly and can be abstracted upon.
any function call may be an implicit exit point
I actually think this, in a weird way, reintroduces colored functions as defers could be executed at different times depending on whether the function is synchronous or async..
await
itself can simply not be cancellable
Then whether a frame is cancelable or not depends on its current suspend point, which otherwise is completely invisible and unpredictable to the caller. What you get then is people saying that for safety, you should never try to cancel a frame. That's a C problem; Zig is better than that.
how would it behave for operations that aren't cancellable
This would typically be known by the programmer, so we would trust them not to attempt this. In such functions, the errdefer
s should clean up the state anyway, and if that has to involve blocking, then so be it. (That might mean cancel
could itself be a suspend point, but I don't think this is necessarily a problem -- we have nosuspend
, after all.)
a chance to see and reject a cancellation request
It's not a request. We don't ask nicely. When we say cancel
, we mean cancel
, not "if you'd be so kind as to cancel".
Only
suspend
s would need to be marked here
An await
or blocking call is still a suspend point. Under your model, if we cancel an awaiting frame, the defer
s in the awaited frame would run, but not in the cancelled frame. (Unless we have some idea of an error set reserved for cancellation, that does not function as an ordinary error set -- because, if a blocking call is not to a coroutine, then your nested error set idea reduces to a single error set, and there's no way to distinguish that from an ordinary returned error.)
defers could be executed at different times
The semantics of defer
don't change -- any exit point runs the defer
s above it, sync or async.
I actually think this, in a weird way, reintroduces colored functions
There is always going to be some semantic difference between synchronous and asynchronous code. That's the whole point. However, the programmer's model doesn't change, and no code needs to be rewritten -- we're still colourblind. Under your proposal, colouring would be a lot worse: asynchronous calls have to have special second error set, synchronous calls * cannot* have that lest it be confused with an ordinary error set.
What you get then is people saying that for safety, you should never try to cancel a frame.
I don't really follow. resume
depends on the state of the frame (is "invisible and unpredictable to the caller") and will panic if its completed or being resumed by another thread (even in ReleaseFast it seems). People aren't saying "you should never try to resume a frame". Almost all async keywords/operations excluding suspend
imply that you are aware of the state of the frame without any explicit notion in code, so I think this type of cancellation is still valuable.
In such functions, the errdefers should clean up the state anyway, and if that has to involve blocking, then so be it.
This has actually been a pain point in Rust futures as well. It requires implementing cancellation at the destructor of the Future/Frame but that is only synchronous. People want asynchronous cancellation (e.g. AsyncDrop
) but that wouldn't fit well into the ecosystem so they resort to heap allocating the async resources that cannot be synchronously cancelled in a non-blocking manner so that it outlives the async context to be cancelled in the future.
The latter of not heap-allocating, which is blocking on cancellation, can actually be both an inefficiency + logic error:
- You monopolize a worker thread in a multi-threaded event loop where other tasks could have been running while you're waiting for your resource to complete to free it in a non-deterministic amount of time.
- If the resource can only be satisfied at the event loop scheduling points (e.g. from a
suspend
) then all worker threads could block waiting for the resource to complete without letting it, producing a deadlock.
It's not a request. We don't ask nicely. When we say cancel, we mean cancel,
Again, not everything can be cancelled. So you end up introducing runtime overhead as stated above in order to accommodate a language semantic. It would be great if we don't end like rust in that regard as its sacrificing customizability for simplicity without a way to opt-out as its at the lang level.
Under your model, if we cancel an awaiting frame, the defers in the awaited frame would run, but not in the cancelled frame.
I think there has been another misunderstanding. My idea of cancellation doesn't include defers or how to run them any differently. It only introduces cancel frame
and suspend { .. } catch |err| { ... }
. Cancelling an awaiting frame would either cause a panic to the cancelling frame or the awaiting frame.
The latter was what I was suggesting before. Here, await
wouldn't introduce a magical new error to the return type. The cancellation state would be handled internally; Await inserts an implicit suspend point when the frame result isn't ready. This internal one would just go from suspend { ... }
to suspend { ... } catch panic("await not cancellable")
.
The former is also an option (that I just thought of), which could be made more forgiving by cancel frame
returning an error if it succeeded in cancelling the frame or not. This moves the decision of "is this cancellation" from the suspend point to the effective resume point. Im not too big of a fan of this approach as it tries to make Zig async/await more readiness based instead of completion base which goes against its original model and introduces a mandatory synchronization overhead to resume points that, atm, could be removed in the future.
The semantics of defer don't change -- any exit point runs the defers above it, sync or async.
The issue here is that suspend + normal function calls that aren't at the end of the scope or use try
are now exit points. This makes using defer trickier as its no longer explicit where an exit point really is in sync vs. async. In async, your defer/errdefer could run earlier than it possibly ever could in sync if a middle function suspended and was cancelled..
Under your proposal, colouring would be a lot worse: asynchronous calls have to have special second error set,
Again, this is not the case. await
would handle the cancelled error/state internally.
Without even looking at the called function, standard coding practice is enough to ensure exactly one suspension is paired with one resumption, and one invocation with one completion -- so, if the programmer has done their job well, they should not encounter language-enforced crashes. However, there is no way of inspecting the internal suspension state of a function, so the invoker can't know whether it's suspended directly or awaiting. Thus, any attempt at cancellation, no matter how careful the programmer, has a possibility of crashing the program. (Even worse, the common pattern of calling a function to register the frame with the event loop is guaranteed to crash.) Call me crazy, but if the programmer has done their due diligence, they shouldn't have to worry about language-enforced crashes.
As you've pointed out though, my model (actually Andrew's model as well in the relevant places) isn't perfect either -- cancellation would then itself be an asynchronous process, which means it would need its own frame, and that frame would itself need to be cancelable, and how the hell would that work? It seems to me that no implementation of cancellation can ever be guaranteed to succeed, which in my eyes contradicts point 11b of the Zen.
In light of this, @andrewrk, I don't believe that cancellation should be implemented at the language level. We may provide a cancel token implementation in the standard library (which is a much better and more flexible solution anyway), but async frames themselves must be awaited to complete. I do believe however that the proposed asynchronous RLS is a worthwhile idea.
We may implement one language-level feature to make userspace cancellation easier: rather than anyframe
, a resumable handle could have type anyframe<-T
-- that is, suspend
has a value, and resume
takes a value of that type to pass to the function, indicating a procession or cancellation:
// In the suspending function
const action = suspend {
event_loop.registerContinuationAndCancellation(@frame(), continuation_condition, cancellation_condition);
};
switch (action) {
.go => {},
.stop => return error.functionXCancelled;
}
// In the event loop (some details missing)
if (frame.continuation and @atomicRmw(bool, &frame.suspended, .Xchg, false, .Weak) {
resume frame.ptr, .go;
frame.* = null;
}
if (frame.cancellation and @atomicRmw(bool, &frame.suspended, .Xchg, false, .Weak) {
resume frame.ptr, .stop;
frame.* = null;
}
Since @frame()
may be called anywhere within the function, and the resumer needs to know the type before analysing the frame, the suspend type (T
in anyframe<-T
) must be part of the function's signature. I propose we reuse while
loop continuation syntax:
const suspendingFunction = fn (arg: Arg) ReturnType : ContinuationType {
// ...
};
Any function that uses the suspend
keyword must have a suspend type. This is not function colouring, as any function with explicit suspend
is necessarily asynchronous anyway (functions that only await
cannot be keyword-resume
d, so do not need a suspend type). The suspend type may be void
or error!void
(no error set inference), in which case the handle type is anyframe<-void
or anyframe<-error!void
(not anyframe
-- we require strongly typed handles for type checking, which is one drawback), and resume
does not necessarily take a second argument, as in status quo.
This not only permits flexible evented userspace cancellation, but also more specialised continuation conditions: a function waiting for multiple files to become available could receive a handle to the first one that does, and combined with a mechanism to check whether a frame has completed, #5263 could be implemented in userspace in the same manner.
At first blush, this may appear to be hostile to inlining async functions -- however, allowing that would already require semantic changes (#5277) that actually complement this quite nicely: @frame()
would return anyframe<-T
of the syntactically enclosing function's suspend type, regardless of the suspend type of the underlying frame, and there is now a strict delineation between resumable and awaitable handles.
This is, of course, a separate proposal -- I'll write up a proper one later.