oneshot Optimize waker size for requirements

trafficstars

This is the result of me tinkering around with the idea described in #4, I would like your approval/disapproval on the basic idea before i write documentation etc.

TLDR; It's possible to reduce the size of a ReceiverWaker if it is known that it will only be used in blocking or non-blocking contexts beforehand. This is currently being decided using crate features, but it would be nice if it was possible to specify this for each channel individually.

Design Idea

This PR uses two traits (SyncWaker / AsyncWaker) to expose different methods on Receiver based on what waker is being used.

There are four different wakers, which are equivalent to the four possible feature combinations right now:

Waker	implements `SyncWaker`	implements `AsyncWaker`	Methods on `Receiver<T, W>`	Size
`DummyWaker`	No	No	`try_recv` only	0
`thread::Thread`	Yes	No	All blocking methods	8
`task::Waker`	No	Yes	All non-blocking methods	16
`GenericWaker`	Yes	Yes	All	16

To avoid breaking changes, a DefaultWaker is also exported, which equates to the "most permissive" of the available wakers based on the enabled features and, as the name implies, is used by default.

The main design issue currently is how to adjust the crate API so the used waker can be specified. Currently, channels are being constructed using

fn channel<T>() -> (Sender<T>, Receiver<T>)

Ideally this would be possible:

fn channel<T, W = DefaultWaker>() -> Sender<T, W>, Receiver<T, W>)
where W: Waker

but default values for generic parameters on functions are not allowed.

So the "next best" option would be to either export more constructor functions:

// "Old" API
fn channel<T>() -> (Sender<T, DefaultWaker>, Receiver<T, DefaultWaker>)

// Enable these based on features
fn wakerless_channel<T>() -> (Sender<T, DummyWaker>, Receiver<T, DummyWaker>)
fn strictly_sync_channel<T>() -> (Sender<T, thread::Thread>, Receiver<T, thread::Thread>)
fn strictly_async_channel<T>() -> (Sender<T, task::Waker>, Receiver<T, task::Waker>)

or to move these functions onto the Channel type itself, so the waker type is already clear.

May 09 '24 21:05 simonwuelker

Hi! Thanks for wanting to contribute. This looks cool and all, but I have some concerns.

To avoid breaking changes, a DefaultWaker is also exported, which equates to the "most permissive" of the available wakers based on the enabled features and, as the name implies, is used by default.

Features need to be additive only. I'm not sure if changing the type of DefaultWaker when the feature set changes is going to cause trouble anywhere :thinking: Technically you are replacing a type with another. But on the other hand, it should change from and to the same thing for every possible observer.

Since I wrote the issue I have maybe changed opinion slightly. This change is pretty complex, and does it really save a lot of space? For this to be worth the extra complexity someone must use this in a way where they have tiny messages, and a huge amount of channels. Like with almost any optimization, I think it would be nice if we had any concrete proof this change is justified by real world usage/benchmarks.

May 23 '24 22:05 faern

It's possible that this has small CPU benefits as well due to each waker being simpler than "support all waking methods". However it's really hard to measure. It's hard because you can only perform a single wake event per channel (by design), and allocation and other things related to creating and destroying each channel probably outweigh the waking part by a lot. But yeah, if we can't prove a performance win, then it's probably not worth the complexity anyway, since the same applies for production use, not just benchmarks.

May 24 '24 07:05 faern

Yeah, those are fair concerns (:

I'm also very unsure whether this is worth the effort! (that's part of the reason this is a draft pr) To address some of your comments:

I'm not sure if changing the type of DefaultWaker when the feature set changes is going to cause trouble anywhere

Hmm, unless someone relies on the result of std::any::type_name (or similar) this shouldn't cause issues. The DefaultWaker only changes to expose more methods as more features are added. But you're not wrong, this is kinda brittle. Especially once you start to consider auto traits!

Like with almost any optimization, I think it would be nice if we had any concrete proof this change is justified by real world usage/benchmarks. [...] It's possible that this has small CPU benefits as well due to each waker being simpler than "support all waking methods".

I can't imagine there are any measurable performance wins here - at most you're saving one match, but even that might be optimized away when the functions are inlined... Performance was not the motivation here. It just feels clunky to carry around space for a task::Waker when you know that the channel will only be used in sync contexts. Because this is currently controlled with cargo features, either all your channels support async or none do.

May 25 '24 18:05 simonwuelker

oneshot oneshot copied to clipboard

Optimize waker size for requirements

Design Idea

oneshot
oneshot copied to clipboard