libs-team
libs-team copied to clipboard
Add LocalWaker support
Proposal
Problem statement
The current implementation of Waker is Send + Sync, which means that it can be freely moved between threads. While this is necessary for work stealing, it also imposes a performance penalty on single threaded and thread per core (TPC) runtimes. This runtime cost is contrary to Rust's zero cost abstraction philosophy. Furthermore, this missing API has led some async runtimes to implement unsound wakers, both by neglect and intentionally.
Motivation, use-cases
Local wakers leverage Rust's compile-time thread safety to avoid unnecessary thread synchronization, offering a compile-time guarantee that a waker has not been moved to another thread. This allows async runtimes to specialize their waker implementations and skip unnecessary atomic operations, thereby improving performance.
It is noteworthy to mention that while this is especially useful for TPC runtimes, work stealing runtimes may also benefit from performing this specialization. So this should not be considered a niche use case.
Solution sketches
Construction
Constructing a local waker is done in the same way as a shared waker. You just use the from_raw function, which takes a RawWaker.
use std::task::LocalWaker;
let waker = unsafe { LocalWaker::from_raw(raw_waker) }
Alternatively, the LocalWake trait may be used analogously to Wake.
use std::task::LocalWake;
thread_local! {
/// A queue containing all woken tasks
static WOKEN_TASKS: RefCell<VecDeque<Task>>;
}
struct Task(Box<dyn Future<Output = ()>>);
impl LocalWake for Task {
fn wake(self: Rc<Self>) {
WOKEN_TASKS.with(|woken_tasks| {
woken_tasks.borrow_mut().push_back(self)
})
}
}
The safety requirements for constructing a LocalWaker from a RawWaker would be the same as a Waker, except for thread safety.
Usage
A local waker can be accessed with the local_waker method on Context and woken just like a regular waker. All contexts will return a valid local_waker, regardless of whether they are specialized for this case or not.
let local_waker: &LocalWaker = cx.local_waker();
local_waker.wake();
ContextBuilder
In order to construct a specialized Context, the ConstextBuilder type must be used. This type may be extended in the future to allow for more functionality for Context.
use std::task::{Context, LocalWaker, Waker, ContextBuilder};
let waker: Waker = /* ... */;
let local_waker: LocalWaker = /* ... */;
let context = ContextBuilder::new()
.waker(&waker)
.local_waker(&local_waker)
.build()
.expect("at least one waker must be set");
Then it can be accessed with the local_waker method on Context.
let local_waker: &LocalWaker = cx.local_waker();
If a LocalWaker is not set on a context, this one would still return a valid LocalWaker, because a local waker can be trivially constructed from the context's waker.
If a runtime does not intend to support thread safe wakers, they should not provide a Waker to ContextBuilder, and it will construct a Context that panics in the call to waker().
Links and related work
- IRLO thread about LocalWaker
- Context made !Send + !Sync with this use case in mind
- Monoio's Waker being unsound
- Glommio's Waker being unsound
- Withoutboat's comments on the waker API v1
- Withoutboat's comments on the waker API v2
What happens now?
This issue is part of the libs-api team API change proposal process. Once this issue is filed the libs-api team will review open proposals in its weekly meeting. You should receive feedback within a week or two.
(NOT A CONTRIBUTION)
A local waker can be accessed with the local_waker method on Context and woken just like a regular waker. All contexts will return a valid local_waker, regardless of weather they are specialized for this case or not.
Should clarify what you mean by this: a context without a local waker set will just return its Waker as a LocalWaker, because every Waker is a valid LocalWaker (they are basically just local wakers that implement Send).
It'd be helpful probably to sketch out what the definition of Context would look like in this proposal to support all of these operations:
- Returning the Waker for
.waker()if the waker is set, and panicking if not. - Returning the LocalWaker for
.local_waker()if the local waker is set, and returning theWakeris not.
Should be possible but it may be a bit tricky.
let context = ContextBuilder::new() .waker(&waker) .local_waker(&local_waker) .build() .expect("at least one waker must be set");
Maybe build should just panic if no waker is set, instead of returning a Result? Not clear what anyone would do except call expect on this result.
Yeah, currenty Waker is transparent to RawWaker, so a transmute from a &Waker to a &LocalWaker should probably be fine, and it would require no clones (assuming LocalWaker would too be transparent to RawWaker).
(NOT A CONTRIBUTION)
possibly LocalWaker could also just be defined as struct LocalWaker(Waker) and you could store the waker as a LocalWaker in Context. Would be a bit counterintuitive but would avoid transmutes.
Agreed, please outline the exact function signatures for WakerBuilder and any added or changed functions and fields for Context. I am wondering if a builder is really warranted here. Do we plan on adding even more options to Context?
Do we want to allow constructing a Context with only a LocalWaker (no Waker)? In that case, calling .waker() would panic? Can we somehow make that a monomorphization error instead?
Note I proposed a ContextBuilder type only, that for the moment will have:
local_waker: to set the local waker on contextwaker: to set the thread safe waker on context.build: to build the context.
The whole purpose of Context was to leave room for future extensibility, so I believe a builder is warranted.
Do we want to allow constructing a Context with only a LocalWaker (no Waker)? In that case, calling .waker() would panic? Can we somehow make that a monomorphization error instead?
I doubt it, contexts would be entirely dynamic with respect to their Waker support. Now, it would be possible to add this optional support later without it being a breaking change.
I'm a little confused.
I doubt it, contexts would be entirely dynamic with respect to their
Wakersupport.
You're saying you doubt it would be possible to make this a monomorphization error?
Just to be clear, you're not proposing to make Waker optional here, just leaving it open for the future.
I think something like try_local_waker() -> Option<&LocalWaker> could be useful as well. I'm wondering if maybe local_waker() should just return an option. Then we don't have to worry about transmutes or anything. Let the user fall back to waker() themselves or unwrap to panic if they always expect a local waker to exist.
(NOT A CONTRIBUTION)
You're saying you doubt it would be possible to make this a monomorphization error?
Just to be clear, you're not proposing to make Waker optional here, just leaving it open for the future.
Via context builder, @tvallotton's proposal does make it possible to construct a Context without a waker. Calling waker on that Context would panic.
The alternative is to not provide a way to construct a Context without a waker. If Rust does that, executors that don't support being awoken from other threads will just have to define a Waker that panics when you wakes it. This seems strictly worse to me.
I think something like try_local_waker() -> Option<&LocalWaker> could be useful as well. I'm wondering if maybe local_waker() should just return an option. Then we don't have to worry about transmutes or anything. Let the user fall back to waker() themselves or unwrap to panic if they always expect a local waker to exist.
There's no reason for this. It's always safe to wake the task from the same thread via the waker mechanism.
The point of this API is that reactors that wake from the same thread that they were polled on can use the local_waker mechanism instead of waker, and they will always just work. Since these reactors often store the waker somewhere, it's important that they be able to get the same type (LocalWaker) regardless of if they are actually waking via a thread safe mechanism or not. Reactors written that way can be run on a single threaded executor or a multithreaded executor and will be totally agnostic to it.
I was just thinking that if someone is trying to enhance performance by using local wakers, knowing if you actually got a local waker or if it just fell back to the non-local waker would be useful.
I think it is safe to say that a 100% of the people that ask for a LocalWaker will conform with a Waker given that all wakers are local wakers. Its preferable to have std perform the transmute than the ecosystem.
(NOT A CONTRIBUTION)
I was just thinking that if someone is trying to enhance performance by using local wakers, knowing if you actually got a local waker or if it just fell back to the non-local waker would be useful.
They're not the same parties. The person who wants the enhanced performance is the one who selects the executor, which isn't the same one who writes the reactor and deals with wakers, it's the one who calls spawn.
Speaking of builders, the ability to build off of an existing Context would be useful for the context reactor hook concept, whereby the executor should be able to make an interface available to leaf futures via Context.
I'm imagining something like this to produce the top level context:
let context = ContextBuilder::new()
.top_level_poller(&mut poller)
.waker(&waker)
.local_waker(&local_waker)
.build()
.expect("at least one waker must be set");
And then any sub-future that makes its own context in order to provide its own waker (selectors and such) would inherit from the previous:
let context = ContextBuilder::from(cx)
.waker(&waker)
.local_waker(&local_waker)
.build()
.expect("at least one waker must be set");
Such that an eventual leaf future could do:
let poller: Option<&mut dyn TopLevelPoller> = cx.top_level_poller();
There may be other reasons to want to inherit config or interfaces from Context like this, for example I/O budgeting. Just something to keep in mind when changing the API of Context for LocalWaker.
(NOT A CONTRIBUTION)
@rust-lang/libs-api What is needed to make progress on this proposal? It seems like it has been dropped. You made the breaking change to support it last year (the actually difficult bit), but then didn't implement the API to take advantage of the change.
From what I understand, the libs team needs to review the proposal, but they review once a week and there is a lot of proposals in the queue. Not to say that I don't share your frustration.
Something that may be interesting to consider, is if the thread safe waker could be lazily initialized on the call to waker from a LocalWaker. I can picture a couple of ways of this could be leveraged for some optimizations. I'm thinking of the following pseudocode:
thread_local! {
// storage for wakers sent to other threads
static SHARED_WAKERS: RefCell<HashMap<u32, Waker>> = RefCell::default();
}
fn create_thread_safe_waker(waker: &LocalWaker) -> Waker {
let id = new_id();
store_waker_in_thread_local(SHARED_WAKERS, id, waker);
create_waker_by_id(id)
}
let context = ContextBuilder::new()
.local_waker(local_waker)
.on_waker(create_thread_safe_waker)
.build()
.unwrap();
This way the cost of upgrading a LocalWaker to a Waker doesn't need to be paid upfront.
I think it'd be useful to have a complete overview of the proposed API. (The LocalWaker type, the Context::local_waker method, the ContextBuilder, anything else?)
cc @rust-lang/wg-async
We discussed this in today's libs-api meeting. This seems like the logical next step after https://github.com/rust-lang/rust/pull/95985, but we'd like to be sure that the async wg is aware of this before continuing. Thanks!
I do have concerns about the builder API. I understand that Context is meant to be extensible, but a builder is a heavyweight solution in terms of API surface which should only be used if necessary. If there are only ever going to be 2 ways to construct a Context (with a Waker or with a LocalWaker) then it may be better to just have 2 explicit constructor functions on Context.
I think it is quite possible we will end up moving some of the logic that is currently being handled with thread locals into context. For example, there is the scheduling budget proposal. I can also imagine some runtime agnostic way of accessing reactor utilities using context or something like that. And of course, some kind of reactor hook like the one that was mentioned by @jkarneges (which might work better if it used file descriptors instead of closures so reactors can share a thread, by the way).
It could also expose executor utilities like spawn. Basically, the way I see it, Context could help us slowly make futures less runtime specific.
Concerning the current proposal, we may want to offer both a Waker and LocalWaker for the same Context, which would require three constructors. A possible alternative would be to set these values directly on Context like so:
let mut context = Context::from_local_waker(waker);
context.set_waker(waker);
However, as it was pointed out in the IRLO thread:
If you use &mut self methods, a poll method (which gets context by &mut) could change the context in a way that will escape that method; with RawWaker not capturing lifetimes this is a bit of a footgun: imagine something like FuturesUnordered using this to set the waker to its newly constructed waker not considering the escape aspect. Probably &mut self methods are not a good idea here.
For what it's worth. I pushed this branch containing the changes in this proposal. You can check out the diff here. I am really interested in seeing this move forward, as I'm currently working on a project of my own that would benefit in the stabilization of this API.
I'm not sure if this is the correct place to comment on the API that @tvallotton has sketched up. If there's a better place, let me know.
My main observation is that rather than duplicating the machinery behind Waker to create LocalWaker, there should be a clever way to re-use some of these structures. A LocalWaker is identical to a Waker in every way except for the static thread-safety guarantees.
We can perhaps leverage the fact that a runtime will only ever need 1 waker. Internally, a Context can store an enum of a LocalWaker and a Waker, with the variant depending on runtime choice. Since all Wakers are controvertible to LocalWakers, a LocalWaker would always be available, but a Waker would require the appropriate variant.
enum WakerKind<'w> {
Local(&'w LocalWaker),
Sendable(&'w Waker)
}
impl<'w> WakerKind<'w> {
fn local_waker(&self) -> &'w LocalWaker {
match self {
Self::Local(local) => local,
Self::Sendable(sendable) => /* convert */
}
}
}
At least, I can't imagine a runtime implementing both a Waker and a LocalWaker at the same time -- this feels wrong. If a runtime provides a thread-safe Waker, that Waker should suffice to be a LocalWaker as well.
Additionally, there are some other ways I think the linked changes could be improved. These are relatively minor points by comparison, by still worth mentioning.
One is that Rust affords us the ability to create compile time-safe builders rather than panicking. It should be possible, via trait bounds, to make ContextBuilder::build only callable when the waker has been set. This is a major improvement over other languages because it prevents API usage errors -- and it is incorrect to assume that only async runtimes will construct Context. Besides, who wants to pay a runtime penalty because of an unnecessary panicking branch in ContextBuilder::build? Yes, I know that waker() will necessarily have an inelegant panicking branch, but that's an unfortunate cost which must be paid.
Also, it may be worth considering a blanket impl LocalWake for W where W: Wake. This accords with the idea that every Waker is a LocalWaker. Of course, one should always be cautious in introducing blanket implementations. They stay around forever.
If we had a redo on the API design, we would probably have that Wake: LocalWake + Send + Sync and LocalWake would define the relevant waking methods. However, that ship has probably sailed due to for backwards compatibility concerns (e.g. it's possible to call Waker::wake() directly rather than using method syntax). I don't know of any way to circumvent this problem except by some kind of edition-related upgrade magic.
My main observation is that rather than duplicating the machinery behind Waker to create LocalWaker, there should be a clever way to re-use some of these structures. A LocalWaker is identical to a Waker in every way except for the static thread-safety guarantees.
Could you elaborate a little further on this thought? I'm not sure if you find that there is too much duplication in std or that there would be too much duplication on the applications.
If a runtime provides a thread-safe Waker, that Waker should suffice to be a LocalWaker as well.
This is the case in fact, as boats said earlier:
a context without a local waker set will just return its Waker as a LocalWaker, because every Waker is a valid LocalWaker (they are basically just local wakers that implement Send).
From the runtime's perspective, they will either store a bunch of wakers in their io-reactor, or a bunch of local wakers. They won't ever need to cast a waker to a local waker themselves.
At least, I can't imagine a runtime implementing both a Waker and a LocalWaker at the same time
I can imagine this. For example, tokio may want to add a local waker to access a Cell instead of a Mutex when waking a task. Or a single threaded runtime may want to support both kinds of wakers, even though the runtime itself only works with LocalWaker. I imagine there will be runtimes that can configure their support for Waker, either at the runtime level or at the task level.
One is that Rust affords us the ability to create compile time-safe builders rather than panicking. It should be possible, via trait bounds, to make ContextBuilder::build only callable when the waker has been set. This is a major improvement over other languages because it prevents API usage errors -- and it is incorrect to assume that only async runtimes will construct Context. Besides, who wants to pay a runtime penalty because of an unnecessary panicking branch in ContextBuilder::build?
I think I've seen this sort of thing in the ecosystem be done with deprecation warnings and #[deny(warnings)] to throw an error at compile time. I haven't seen it with trait bounds, so I'm not sure how it would look. Now, when implementing this I made everything const and inline hoping the assert! could be optimized out by the compiler. If this isn't the case, I think I would prefer having a try_build method, that if we really care about not unwinding, we can just unwrap_unchecked().
If we had a redo on the API design, we would probably have that Wake: LocalWake + Send + Sync
I don't think this is all that bad. In fact, I think forcing users to implement LocalWake before Wake is rather annoying if they are not going to use it. And we don't really need the implementation to be able to cast a Waker to a LocalWaker.
On the note of try_build it might be useful to also have try_waker, so a future can inspect if the context supports Waker or if they need to make an alias to the &LocalWaker (probably with assistance from the runtime).
WG-async discussed this in our recent triage meeting, and we plan to discuss it further later this week.
Two ideas for possible API enhancements based on examining @tvallotton 's branch:
- The entire
alloc::taskmodule is currently conditional oncfg(target_has_atomic = "ptr"), but with the addition ofLocalWaker, it has items that are in principle usable without atomics … except thatContextunconditionally offers aWaker. This could be addressed as follows:- Remove that part of the cfg condition from the module.
- Add cfg condition to
Wakesince it needsArc. - Add cfg condition to
Wakersince it can't meaningfully be constructed without atomics. - Add cfg condition to
Context::waker(), so that on non-atomic-ptr platforms,Contexthas onlylocal_waker(). - (In principle,
Wakercould be implemented in a hardware-specific way that isn't strictly involving atomic ptrs as rustc understands them, allowing further relaxing to haveWakerbut notWakeeverywhere, but that's even more niche.)
- Add
impl<W: Wake> From<Arc<W>> for LocalWaker(corresponding to the existing impl forWaker). This will simplify the job of optionally-no_stdcode that wishes to offer eitherWakers orLocalWakers and isn't looking for maximum performance, by not requiring it to meticulously useRceverywhere and explicitly implementLocalWaker; instead it can useArc<impl Wake>to produce either kind of waker. (This would still be possible withoutcoreassistance, but require anunsaferoute throughRawWaker.)
Both of these are additive, so they could be done later, but they'd be easy to do immediately.
(NOT A CONTRIBUTION)
I can imagine this. For example, tokio may want to add a local waker to access a Cell instead of a Mutex when waking a task. Or a single threaded runtime may want to support both kinds of wakers, even though the runtime itself only works with LocalWaker. I imagine there will be runtimes that can configure their support for Waker, either at the runtime level or at the task level.
I can affirm that this is desirable. It's plausible in a single threaded executor to have a waker that has special handling for being awoken from another thread, in case another thread triggers work for a task on that executor. The LocalWaker wouldn't need to include this check, but the Waker would. They could even use the same (thread safe) data type, but with different vtables.
One is that Rust affords us the ability to create compile time-safe builders rather than panicking. It should be possible, via trait bounds, to make ContextBuilder::build only callable when the waker has been set.
You don't need trait bounds for this, you can just make ContextBuilder not have the empty constructor, and instead have two constructors: one which passes a Waker and one which passes a LocalWaker. The Waker one could also set the LocalWaker (instead of doing it in build), and then that would not be an Option even in ContextBuilder. There would never need to be a null check of LocalWaker.
ContextBuilder would still have setters that overwrite these fields; just you're guaranteed LocalWaker is always set.
I think both of @kpreid proposals seem reasonable, if no one has concerns about them I will include them in the branch.
You don't need trait bounds for this, you can just make ContextBuilder not have the empty constructor, and instead have two constructors.
I hadn't thought of this. It seems like an overall improvement and I can't think of any downsides. I'll include it too.