async-task Add metadata to tasks

In the course of resolving #31, I realized that users may want to associate other things with tasks, like string names, priorities, debugger information, and otherwise. This PR adds a new generic field to both Task and Runnable, M. By default, M is the unit type, (), in order to keep consistency with the previous operation of this crate. This metadata is stored in the allocation and can be accessed via the metadata() function on both Task and Runnable.

In order to create new tasks with metadata, I've introduced a new Builder structure. This builder expands on the previous spawn_* family of functions by providing a metadata() function that creates the task using a generic piece of metadata. Builder may be useful later on for solving #24 once allocator APIs are stable.

This is a minor version bump; Tasks use () as their default M parameter to ensure that the logic is kept consistent.

Discussion questions:

Since M is essentially stored in an Arc, I've added M: Send + Sync bounds to the impl Send/Sync for both the Task and the Runnable. Is this right?
In order to make the metadata more useful, there may be a way to somehow make the metadata available to the current future. However, I'm not sure if there's a safe way of doing this.

Closes #31

Oct 02 '22 20:10 notgull

At first glance, I feel that generic metadata is a bit overkill/complicated, but I have no strong opinion on this.

Oct 07 '22 12:10 taiki-e

@smol-rs/admins Any thoughts on this?

Oct 07 '22 12:10 taiki-e

At first glance, I feel that generic metadata is a bit overkill/complicated, but I have no strong opinion on this.

I don't think it's overkill but seems to go beyond the scope. #31 is about debugging and makes perfect sense. If users need to associate other metadata, they can create custom types that contain the task along with any other data they need and pass it around. Unless I am missing something?

A lot more useful thing to have would be task-local data, which async-std and tokio provide an API for.

Oct 07 '22 12:10 zeenix

Technically, you're right about metadata. The metadata could be stored in both the scheduler function and alongside the task handle. You could do something like this for the priority queue example I wrote:

let priority = 3usize;
let schedule = move |runnable| {
    QUEUE.lock().unwrap().push((priority, runnable));
};

let (task, runnable) = spawn(..., schedule);

...and then priority is accessible to both the scheduling function context and the task handle context. However, for more complicated cases, it may need to be passed around via an Arc. For instance, if you wanted to change the priority of the task, you'd need an AtomicUsize stored in an Arc. At that point, you now have two heap allocations. It makes more sense to reuse that same heap allocation for the metadata as well, which is what this PR sets out to do.

Overall I would be opposed to task-local variables. We could emulate them by storing a Mutex<HashMap> in the metadata. However, if this crate itself were to do this, I'm not sure it would be possible without using Mutex (which would make us not no_std) or relying on dashmap.

Oct 07 '22 14:10 notgull

Overall I would be opposed to task-local variables. We could emulate them by storing a Mutex<HashMap> in the metadata. However, if this crate itself were to do this, I'm not sure it would be possible without using Mutex (which would make us not no_std) or relying on dashmap.

It doesn't necessarily have to be part of this crate and if it's part of this crate, it can be feature-gated.

Oct 07 '22 14:10 zeenix

It makes more sense to reuse that same heap allocation for the metadata as well, which is what this PR sets out to do.

Thanks for explaining. I guess the only questions are that which you asked in the description:

Since M is essentially stored in an Arc, I've added M: Send + Sync bounds to the impl Send/Sync for both the Task and the Runnable. Is this right?

Seems fine to me since it's not an API break.

In order to make the metadata more useful, there may be a way to somehow make the metadata available to the current future. However, I'm not sure if there's a safe way of doing this.

Indeed this would make the metadata api a lot more appealing.

Oct 07 '22 14:10 zeenix

I've implemented a new strategy. Instead of the future, the Builder takes an FnOnce(&'a M) -> Fut where Fut: 'a. This allows the future to incorporate the reference to the metadata so that it can be within the future. This would allow, i.e, a HashMap to be stored inside in order to contain task-local variables. The future could then take that reference to the HashMap and use it. Since incorporating a lifetime makes a future !'static, this means that the unsafe API must be used.

Please let me know if there is a more idiomatic way of going about this safely.

Oct 07 '22 20:10 notgull

Looks mostly good to me, do we ensure that the Output of the future can't hold onto a &'a M reference?

Oct 07 '22 22:10 fogti

Looks mostly good to me, do we ensure that the Output of the future can't hold onto a &'a M reference?

I think that's assured by this line in spawn_unchecked's documentation:

If future is not 'static, borrowed variables must outlive its Runnable.

Since the borrowed metadata is a borrowed variable, returning it would mean outliving the Runnable.

For the safe cases, the Future is 'static so it can't contain the metadata refrence.

Oct 07 '22 22:10 notgull

@smol-rs/admins Are there any blockers to this being approved for merge? It seems like this was blocked for further discussion.

Nov 06 '22 02:11 notgull

I think the docs around spawn_unchecked's safety could be worded better. For one, "If future’s output is not 'static, borrowed variables must outlive its Runnable." We could replace "future's output" with Fut, since it took me a second read to realize it's not Fut::Output being referred to. Also, we should clarify whether or not captures of the metadata reference "outlive" the runnable; again, after a few reads and some mental processing, I can be pretty sure they do, but clarity is key for unsafe.

Second, "If schedule is not 'static, borrowed variables must outlive the task’s Waker." Again, when I read it again, it's clear it refers to any instances of Waker returned by Runnable::waker, but we should just say that instead. When I first read "the task's Waker" I thought it was referring to the waker for another future .awaiting the Task instance.

Aug 01 '23 00:08 khoover

I think the docs around spawn_unchecked's safety could be worded better. For one, "If future’s output is not 'static, borrowed variables must outlive its Runnable." We could replace "future's output" with Fut, since it took me a second read to realize it's not Fut::Output being referred to. Also, we should clarify whether or not captures of the metadata reference "outlive" the runnable; again, after a few reads and some mental processing, I can be pretty sure they do, but clarity is key for unsafe.

Second, "If schedule is not 'static, borrowed variables must outlive the task’s Waker." Again, when I read it again, it's clear it refers to any instances of Waker returned by Runnable::waker, but we should just say that instead. When I first read "the task's Waker" I thought it was referring to the waker for another future .awaiting the Task instance.

I think this is a good idea! My bandwidth is limited at the moment, do you mind writing a PR?

Aug 01 '23 00:08 notgull

I could, sure, but I actually don't know whether it's fine for Fut's lifetime to be exactly 'a or not (i.e. it captures the metadata reference). I'm pretty sure it is, since the metadata lives for the union of the task and runnable lifetimes, whereas the future itself only lives for the runnable's lifetime, but I can't say for sure.

EDIT: Ok, after reading some more, yeah, guaranteed the Future is dropped before the ref count is decremented on the raw task, and the metadata reference is valid for the lifetime of the raw task. So capturing references to metadata is safe.

Aug 01 '23 00:08 khoover