laria icon indicating copy to clipboard operation
laria copied to clipboard

Coroutines, semicoroutines, `async`/`await`, and more!

Open PatchMixolydic opened this issue 3 years ago • 2 comments

Unlike the issue on effects, which is mostly just hypothetical "oh this seems like something I might want someday", some form of coroutines would be useful to me today: Devil's Emergence uses a hacked-up version of the VM to get them. Part of the trouble with adding coroutines to the language is what, exactly, they should look like.

An overview

In case you are unfamiliar (and for the benefit of myself six months down the line), there are ~three types of coroutines you might see in the wild:

async/await

This model allows you to await special values called futures, which will pause the current task until the resources required by the future are available. There are quite a few ways to create a future, though the most common is by calling an async fn. Futures are managed by an async executor, which is in charge of waking up in-progress tasks when they're ready to continue. Here's an example in Rust that prints "hello", sleeps for 4 seconds, then prints "world":

use std::time::Duration;
use tokio::time::sleep;

// `async fn`s in Rust return futures when called
async fn foo() {
    println!("Hello");
    // `tokio::time::sleep` is an `async fn`.
    // Wait until the future created by `sleep(_)` is complete.
    sleep(Duration::from_secs(4)).await;
    println!("world!");
}

Here, sleep(Duration::from_secs(4)) will create a future (let's call it SleepFuture), and awaiting that future will allow any awaits in SleepFuture to propagate to the async executor. In a sense, it's like SleepFuture was inlined (or, if you're familiar with Python's generators, it's like we had run yield from sleep(...)). This may seem useless on its own, but much like with threads, async fns have another trick up their sleeve: spawning.

use tokio::{net::TcpListener, task};

async fn handle_connection(sock: TcpStream) {
    do_something_cool_with_the(sock).await.unwrap();
}

#[tokio::main]
async fn main() -> Result<(), Box<dyn Error>> {
    let listener = TcpListener::bind(([127, 0, 0, 1], 8000)).await?;

    loop {
        let sock = listener.accept().await?;
        task::spawn(handle_connection(sock));
    }
}

This allows main and all spawned instances of handle_connection to run concurrently. Upon awaiting, the async executor will put the current task to sleep until the resource it requested is ready, then try running other tasks. In other words, async/await is a form of cooperative multitasking, where futures are the processes and awaits are the yield points.

Semicoroutines (generators)

Semicoroutines, as their name implies, are a restricted form of coroutines. They're also known as "generators", though they're useful for more than just generating values. Compared to normal functions, semicoroutines have the power to yield. Here's an example (in Python, since Rust's generator syntax is a work-in-progress):

# Here's a normal function. Something something dragon never yields.
def meiling():
    return 4

# Here is a semicoroutine, identified solely by the use of `yield`.
def my_range(start, end):
    i = 0
    while i < end:
        yield i
        i += 1

yield x essentially returns x to its caller, but with one major difference: the next time the semicoroutine is invoked, it starts from the yield point, rather than at the end of the function:

>>> # `sus_range` is a suspended coroutine; in other words, a future
>>> sus_range = my_range(0, 5)
>>> next(sus_range)
0
>>> next(sus_range)
1

Note how the second time sus_range was invoked using next, the return value changed.

An astute reader might recognize that this is essentially equivalent to async/await. Though yield returns to its caller, a yield can be bubbled up the call stack in the same fashion as await:

# Oversimplified example
def bubble(sus_range):
    for x in sus_range:
        yield sus_range

The actual code for this is quite complex thanks to complex interactions with other language features (exceptions), so Python also provides a simple sugar for this operation, as mentioned in passing in the previous section:

def bubble(sus_range):
    yield from sus_range

Note that yielding from a generator in Python appears to be quite similar to awaiting a future in Rust.

Coroutines

Finally, we arrive at full coroutines. While semicoroutines can only yield to their immediate caller and futures can only yield to an async executor, full coroutines can yield to any other suspended coroutine. Here's an example in a pseudo Rust-like language because, erm, I couldn't find a familiar language that supported full coroutines (aside from assembly)...

co produce()
    // `resume` specifies the arguments to resume the coroutine with, as opposed
    // to the arguments needed to create the coroutine
    resume(
        consumer: impl Coroutine<Resume = (Self, Vec<_>), Output = ()>,
        mut stuff: Vec<_>,
    )
{
    loop {
        fill(&mut stuff);
        // `this` is the current coroutine/future.
        // Resume `consumer` from its last yield point (or the start, if it hasn't run yet).
        consumer(this, stuff);
    }
}

co consume()
    resume(
        producer: impl Coroutine<Resume = (Self, Vec<_>), Output = ()>,
        mut stuff: Vec<_>,
    )
{
    loop {
        for x in stuff.drain(..) {
            do_something_with(x);
        }

        producer(this, stuff);
    }
}

fn main() {
    let producer = produce();
    let consumer = consume();
    let stuff = Vec::new();

    producer(consumer, stuff);
}

This example is somewhat confusing, mostly because I have no idea what coroutine syntax would look like, but hopefully this gives some sort of vague idea of what's going on. Essentially, consume and produce jump between each other, implementing cooperative scheduling without any sort of management.

Coroutines can also be implemented on top of semicoroutines using a trampoline function, which can dispatch yielded values to other coroutines at the yielder's request. Perhaps, then, it can be said that there is only one kind of coroutine, and that the real difference between semicoroutines and coroutines is how mind-boggling it would be to expose at the language level.

Prior art

  • Though they've already been discussed, it might be beneficial to read more about Python's generators and Rust's async fns to get a deeper understanding of how they work.

  • For completeness's sake, here's Rust's current generator syntax as specified in eRFC 2033. At the time of writing, generators must be closures:

fn range(start: i32, end: i32) -> impl Generator<(), Yield = i32, Return = ()> {
    move || {
        let mut i = start;

        while i < end {
            yield i;
            i += 1;
        }
    }
}

Generator closures can also be annotated with static, which implies that they have to be pinned while they are suspended. Both static and regular generators have to be pinned to be resumed, but regular generators can be unpinned and moved while suspended, whereas static generators cannot.

Rust's generators are implemented using the Generator trait.

function my_range(first, last)
    return function ()
        local i = first
        while i < last do
            coroutine.yield(i)
            i = i + 1
        end
    end
end

sus_range = coroutine.wrap(my_range(1, 5))
for x in sus_range do
    print(x)
end

PatchMixolydic avatar May 14 '22 21:05 PatchMixolydic

What should Laria do?

Right now, I feel that semicoroutines would be the best approach for Laria. Of course, that's far from the end of the story:

  • Do semicoroutines need to be declared explicitly, or should it be inferred from the fact that they yield?
  • Speaking of, should we use yield? await? And what syntax should be used for either of those?
  • Should they really be called semicoroutines? Can't I just call them "generators" like everyone else?

Name

As mentioned previously, semicoroutines can do more than generate values. For instance, they can be used to implement cooperatively scheduled tasks. "Semicoroutine", however, is extremely unwieldy. This is probably one reason why "generator" is a much more popular term. Lua just calls their semicoroutines "coroutines". Go calls their green thread primitives "goroutines", but those yield implicitly and are preemptively scheduled (most of the time).

Declaration syntax

Semicoroutines should probably be distinguished in the function declaration to clarify whether the function returns once or many times, but how?

// People are familiar with `async`/`await` already,
// but seeing `yield` in an `async fn` might be surprising for Rustaceans.
async fn foo() {}

// `gen` for generator. Of course, if we're calling them "semicoroutines"...
gen foo() {}
gen fn foo() {}

// This sounds like it's somehow only half a function.
// ("`semi foo`? Like... like half of a `foo`?
//   `semi fn foo`? Like... a partial application??
//   A partial function, as opposed to a total function???")
semi foo() {}
semi fn foo() {}

// This hints that it's a coroutine, but some pedants might object to using `co`
// for a semicoroutine.
co foo() {}
co fn foo() {}

Right now, I'm leaning towards async fn and co.

await and yield

The language would probably benefit from both await and yield. As described in the original post, this is comparable to Python's generators offering yield from and yield. Since await produces a value, I believe Laria should follow in Rust's footsteps and use postfix await syntax (expr.await), which nicely wraps up one mystery.

One question remains: what should yield look like? One thing I neglected to mentioned in the original post was that you can resume a semicoroutine with a given value, which is used as the "return value" of a yield expression. Here's an example in Lua:

sus = coroutine.create(function()
    while true do
        print("resumed with", coroutine.yield(1))
    end
end)

print("`sus` yielded", coroutine.resume(sus, 1))
print("`sus` yielded", coroutine.resume(sus, 2))
print("`sus` yielded", coroutine.resume(sus, 3))
`sus` yielded	true	1
resumed with	2
`sus` yielded	true	1
resumed with	3
`sus` yielded	true	1

So if yield is allowed to produce a value, I believe Laria should also have postfix yield:

async fn foo() -> i32 {
    loop {
        println!("resumed with half of {}", 1.yield.mul(2));
    }
}

fn main() {
    let mut sus = foo();

    for i in 1..=3 {
        println!("`sus` yielded {}", sus.resume(i));
    }
}

Of course, whether Laria should allow this is an open question. If not, then yield expr would probably be better, in the same vein as return expr.

Semicoroutine yields this, returns that

You might have noticed a bit of a problem in the previous example. Let's try writing my_range in Laria.

async fn my_range(start: i32, end: i32) -> i32 {
    let mut i = start;

    while i < end {
        i.yield;
        i += 1;
    }
}

Note that the function yields i32s, but once it's done, it returns ()! That issue is simple enough to fix. One option is to only allow semicoroutines to return (). Another is to specify both the yield type and the return type in the function signature.

async fn my_range(start: i32, end: i32) yield i32 /* -> () */ {
    // ...
}

The problem comes in when resuming the semicoroutine.

Resumption

Now that we have these nice shiny semicoroutines, we're set for life! ... except that we actually have to use them at some point. It's like pure functions all over again!

let mut sus_range = my_range(1, 10);
// what now!!!!!!!

Fortunately, Laria is a scripting language that shunts everything onto the heap for the sake of convenience, so we don't have to worry about things like pinning. Taking a page out of Lua's book, perhaps there could be a resume function on semicoroutines. Let's take the most obvious definition of resume, fn resume(&mut self) -> Option[[Self as Semicoroutine]::Yield]:

while sus_range.resume() matches Some(i) {
    println!("`sus_range` yielded {i}");
}

This poses a problem: what happens when the semicoroutine returns? Of course, if semicoroutines can only return (), this is obvious: semicoroutine.resume() returns None. If semicoroutines are allowed to return other values, then we'd need something like Rust's GeneratorState:

// In the standard library; name far from final.
pub enum AsyncOutput[Y, R] {
    Yield(Y),
    Return(R),
}

// In our example script
while sus_range.resume() matches AsyncOutput::Yield(i) {
    println!("`sus_range` yielded {i}");
}

If someone needs to access the return value of a semicoroutine, they can do that by matching on AsyncOutput::Return.

PatchMixolydic avatar May 15 '22 00:05 PatchMixolydic

Yet more design considerations

async as an effect

You may have heard/noticed that async can be represented as an effect. Since Laria might be getting effects, it might make sense to define async to be a special (lang item backed) effect:

// `async_effect` gets *special treatment* and can have a keyword as a name
#[lang_item = "async_effect"]
effect async;

However, this is weird for a couple reasons. First, the current plan for Laria's effects only allow them to carry one value (for instance, you might have a vfs effect that carries a reference to a virtual filesystem), so it would be strange for async to introduce both yield and await. Secondly, it's rather bizarre for an effect to change the behaviour of calling a function (coroutines will probably be lazily executed in Laria, as in Rust). Given that, I think that async shouldn't be modeled as an effect.

async functions are bikeshed-coloured

Adding async to the language risks exposing Laria to the function colouring problem. Look at this Rust code:

async fn yousoro() -> i32 {
    3
}

fn main() {
    yousoro().await;
}

This produces:

error[E0728]: `await` is only allowed inside `async` functions and blocks
 --> src/main.rs:6:14
  |
5 | fn main() {
  |    ---- this is not `async`
6 |     yousoro().await;
  |              ^^^^^^ only allowed inside `async` functions and blocks

This diagnostic represents the function colouring problem: await can't be used in non-async functions without giving it to something like tokio for processing. Laria has a few options here:

  • Do nothing. Sync and async contexts are distinct. The standard library will provide a couple ways to process async functions in a sync context, including an executor and a block_on function.
  • Allow co fn main (ie. async fn main). Awaiting a coroutine in main is the same as passing it to block_on.
  • Calling a coroutine in a sync context implicitly blocks on it. This has the potential to cause confusion, prevents custom executors from working, and must have some way to deal with both the yielded and returned values.
  • Something like keyword generics. This would let functions be generic over whether or not they were async, allowing them to be called in any context.

No matter what I choose, Laria will still have function colouring once it gets effects. block_on in the standard library is probably sufficient to transform async APIs into sync APIs, though I'm not 100% sure that'd be performant. Right now, I'm between "do nothing" and "allow co fn main".

Executor

Laria will probably have some sort of standard coroutine executor, accessible through a spawn function in the standard library:

use std::task::{join, spawn};

co fn foo() {
    print("hello");
}

co fn bar() {
    print("world");
}

co fn main() {
    let foo = spawn(foo());
    let bar = spawn(bar());
    join(foo, bar).await;
}

This might print:

world
hello

spawn is magical; it essentially passes the given coroutine (or a regular function) up to the VM, which uses it to create a cooperatively-scheduled green thread. This is suitable for my needs (this is more or less the model I've hacked together for Devil's Emergence), but might not be suitable for everyone's. I'm not certain if it's worthwhile, but I'd like to provide some sort of support for custom executors.

PatchMixolydic avatar Aug 13 '22 15:08 PatchMixolydic