async-tungstenite Task wakes too often in release mode when used with `async

This may cause a CPU core to be constantly at 100% after a client connects and sends nothing:

#[async_std::main]
async fn main() -> async_tungstenite::tungstenite::Result<()> {
    let listener = async_std::net::TcpListener::bind("127.0.0.1:8080").await?;
    let (stream, _) = listener.accept().await?;
    async_tungstenite::accept_async(stream).await?;
    Ok(())
}

(the same also happens after the handshake when the client is idle; using only accept_async as the example, because shorter)

It could be an issue with async-std but I haven't yet been able to replicate it without async-tungstenite ~~(both codebases are still a bit hard for me to navigate)~~.

[dependencies]
async-std = { version = "1.12.0", features = ["attributes"] }
async-tungstenite = { version = "0.26.1", features = ["async-std-runtime"] }

Jun 13 '24 15:06 afeistel

If you do the same with tokio, does it behave correctly?

Jun 14 '24 12:06 sdroege

I've tested it with async-net (versions 1 and 2), and that worked okay; will test with tokio now;

ed.: tokio is fine

Jun 14 '24 13:06 afeistel

This, to me, looks a lot like some sort of soft race condition, because it goes away if I add some overhead (or run the thing in debug) or do anything that causes poll be called not as soon as it wakes, and happens more often if I add this at the very start of main:

async_std::task::spawn(futures_util::future::pending::<()>());

Jun 14 '24 13:06 afeistel

Update: this seems to affect anything that uses async_io::Async::<std::net::TcpStream>::poll_read directly. Happens both on versions 1 and 2

Jun 14 '24 14:06 afeistel

I found the reason (async_tungstenite::compat):

let waker = match kind {
    ContextWaker::Read => task::waker_ref(&self.read_waker_proxy),
    ContextWaker::Write => task::waker_ref(&self.write_waker_proxy),
};

and the "fix" (for now I can't figure out much better):

let waker = match kind {
    ContextWaker::Read => task::waker_ref(&self.read_waker_proxy),
    ContextWaker::Write => task::waker_ref(&self.write_waker_proxy),
}
.clone();

why:

Waker returned by task::waker_ref changes its vtable on clone
async-io's Source, just like AtomicWaker, checks Waker equality by Waker::will_wake which compares both the pointer and the vtable
async-io's Source wakes the older Waker on change

as for why async-net works: I have no clue

Jun 14 '24 18:06 afeistel

Okay, so I've added a print:

let waker = match kind {
    ContextWaker::Read => task::waker_ref(&self.read_waker_proxy),
    ContextWaker::Write => task::waker_ref(&self.write_waker_proxy),
};
eprintln!("{waker:?} {:?}", waker.clone());

In debug builds (some parts omitted):

... data: 0x563a8f431fa0, vtable: 0x563a8e701368 ... data: 0x563a8f431fa0, vtable: 0x563a8e701368 ...

In release builds on 1.78/1.79:

... data: 0x563ac7b8cf30, vtable: 0x563ac5b80238 ... data: 0x563ac7b8cf30, vtable: 0x563ac5b80178 ...

???????

compiler bug?

Seems like it's making the same vtable be at different addresses.

ed.: it was indeed a compiler bug: https://github.com/rust-lang/futures-rs/pull/2829#issuecomment-1962863389

Jun 14 '24 18:06 afeistel

There's been a change in how Waker works. std::task::Wake has adapted to it and futures::task::ArcWake has not. I've checked with both std::task::Wake and with a patched version of futures::task::ArcWake, both work fine. So, ig, if there's a need to mitigate this without waiting for futures-task to be changed, switching over std::task::Wake is a possibility at the cost of some extra atomic operations (std::task doesn't provide an equivalent to waker_ref, as far as I see).

Jun 14 '24 22:06 afeistel

I don't have time to look into this in more detail until some time next week, but this sounds like a bug in the code doing the comparison. If multiple codegen units are involved, it's not guaranteed that the vtable is always the same for the same object.

That's why e.g. https://doc.rust-lang.org/stable/std/sync/struct.Arc.html#method.ptr_eq exists.

Jun 15 '24 16:06 sdroege

I'll list all the parts of code I found relevant so far in one place, ~~and with less guesses~~

async-tungstenite creating wakers that change their vtable on clone: https://github.com/sdroege/async-tungstenite/blob/6ff28b380c55b59a6ca96a6d76528754fa7b50ff/src/compat.rs#L128-L129

async-std directly uses async_io::Async: https://github.com/async-rs/async-std/blob/8947d8e03c4c267fdd3828e07368cefc7d39b002/src/net/tcp/stream.rs#L5 https://github.com/async-rs/async-std/blob/8947d8e03c4c267fdd3828e07368cefc7d39b002/src/net/tcp/stream.rs#L50 https://github.com/async-rs/async-std/blob/8947d8e03c4c267fdd3828e07368cefc7d39b002/src/net/tcp/stream.rs#L308

async-io checking Wakers for equality and waking the old one on change: https://github.com/smol-rs/async-io/blob/master/src/reactor.rs#L456-L461

futures-task creates vtables for original and cloned (note no #[inline(always)] on clone_arc_raw): https://github.com/rust-lang/futures-rs/blob/master/futures-task/src/waker_ref.rs#L64 https://github.com/rust-lang/futures-rs/blob/master/futures-task/src/waker.rs#L38-L42

Patch to alloc::task in Rust that uses #[inline(always)] to avoid the split across CGUs: https://github.com/rust-lang/rust/pull/121622

async-tungstenite can mitigate this by using std::task::Wake instead of futures_task::ArcWake or cloning the waker, both options involve extra Arc clones to work (&Arc<impl std::task::Wake> can't be directly converted into impl Deref<Target = Waker> like futures_task::waker_ref does).

futures-task can mitigate this by putting #[inline(always)] on clone_arc_raw.

Jun 15 '24 17:06 afeistel

futures-task can mitigate this by putting #[inline(always)] on clone_arc_raw.

That seems like a solution that will just fail again some time in the future. This sounds to me like will_wake() should actually only check the pointer for equality and not the vtable.

async-tungstenite can mitigate this by using std::task::Wake instead of futures_task::ArcWake or cloning the waker

Doing this as a workaround for now seems acceptable. Do you want to provide a PR for that?

Jun 17 '24 05:06 sdroege

Intentionally keeping this open here so there's a reminder to revert the change once that's safe to do again.

Jun 17 '24 07:06 sdroege

https://github.com/rust-lang/futures-rs/commit/b92f4c55a3f8e9ffd2201b7a2c1827343339c92e

Sep 16 '24 16:09 afeistel

Tested with futures 0.3.31 and async-tungstenite =0.26.1, the problem does not seem to appear, unlike with futures =0.3.30 and async-tungstenite =0.26.1.

Oct 06 '24 21:10 afeistel

Thanks, I'll update the minimum version for futures with the next release and drop the workaround

Oct 14 '24 10:10 sdroege

Task wakes too often in release mode when used with `async_std`