Task wakes too often in release mode when used with `async_std`
This may cause a CPU core to be constantly at 100% after a client connects and sends nothing:
#[async_std::main]
async fn main() -> async_tungstenite::tungstenite::Result<()> {
let listener = async_std::net::TcpListener::bind("127.0.0.1:8080").await?;
let (stream, _) = listener.accept().await?;
async_tungstenite::accept_async(stream).await?;
Ok(())
}
(the same also happens after the handshake when the client is idle; using only accept_async as the example, because shorter)
It could be an issue with async-std but I haven't yet been able to replicate it without async-tungstenite ~~(both codebases are still a bit hard for me to navigate)~~.
[dependencies]
async-std = { version = "1.12.0", features = ["attributes"] }
async-tungstenite = { version = "0.26.1", features = ["async-std-runtime"] }
If you do the same with tokio, does it behave correctly?
I've tested it with async-net (versions 1 and 2), and that worked okay;
will test with tokio now;
ed.: tokio is fine
This, to me, looks a lot like some sort of soft race condition, because it goes away if I add some overhead (or run the thing in debug) or do anything that causes poll be called not as soon as it wakes, and happens more often if I add this at the very start of main:
async_std::task::spawn(futures_util::future::pending::<()>());
Update: this seems to affect anything that uses async_io::Async::<std::net::TcpStream>::poll_read directly. Happens both on versions 1 and 2
I found the reason (async_tungstenite::compat):
let waker = match kind {
ContextWaker::Read => task::waker_ref(&self.read_waker_proxy),
ContextWaker::Write => task::waker_ref(&self.write_waker_proxy),
};
and the "fix" (for now I can't figure out much better):
let waker = match kind {
ContextWaker::Read => task::waker_ref(&self.read_waker_proxy),
ContextWaker::Write => task::waker_ref(&self.write_waker_proxy),
}
.clone();
why:
Wakerreturned bytask::waker_refchanges its vtable on cloneasync-io'sSource, just likeAtomicWaker, checksWakerequality byWaker::will_wakewhich compares both the pointer and the vtableasync-io'sSourcewakes the olderWakeron change
as for why async-net works: I have no clue
Okay, so I've added a print:
let waker = match kind {
ContextWaker::Read => task::waker_ref(&self.read_waker_proxy),
ContextWaker::Write => task::waker_ref(&self.write_waker_proxy),
};
eprintln!("{waker:?} {:?}", waker.clone());
In debug builds (some parts omitted):
... data: 0x563a8f431fa0, vtable: 0x563a8e701368 ... data: 0x563a8f431fa0, vtable: 0x563a8e701368 ...
In release builds on 1.78/1.79:
... data: 0x563ac7b8cf30, vtable: 0x563ac5b80238 ... data: 0x563ac7b8cf30, vtable: 0x563ac5b80178 ...
???????
compiler bug?
Seems like it's making the same vtable be at different addresses.
ed.: it was indeed a compiler bug: https://github.com/rust-lang/futures-rs/pull/2829#issuecomment-1962863389
There's been a change in how Waker works. std::task::Wake has adapted to it and futures::task::ArcWake has not. I've checked with both std::task::Wake and with a patched version of futures::task::ArcWake, both work fine. So, ig, if there's a need to mitigate this without waiting for futures-task to be changed, switching over std::task::Wake is a possibility at the cost of some extra atomic operations (std::task doesn't provide an equivalent to waker_ref, as far as I see).
I don't have time to look into this in more detail until some time next week, but this sounds like a bug in the code doing the comparison. If multiple codegen units are involved, it's not guaranteed that the vtable is always the same for the same object.
That's why e.g. https://doc.rust-lang.org/stable/std/sync/struct.Arc.html#method.ptr_eq exists.
I'll list all the parts of code I found relevant so far in one place, ~~and with less guesses~~
async-tungstenite creating wakers that change their vtable on clone:
https://github.com/sdroege/async-tungstenite/blob/6ff28b380c55b59a6ca96a6d76528754fa7b50ff/src/compat.rs#L128-L129
async-std directly uses async_io::Async:
https://github.com/async-rs/async-std/blob/8947d8e03c4c267fdd3828e07368cefc7d39b002/src/net/tcp/stream.rs#L5
https://github.com/async-rs/async-std/blob/8947d8e03c4c267fdd3828e07368cefc7d39b002/src/net/tcp/stream.rs#L50
https://github.com/async-rs/async-std/blob/8947d8e03c4c267fdd3828e07368cefc7d39b002/src/net/tcp/stream.rs#L308
async-io checking Wakers for equality and waking the old one on change:
https://github.com/smol-rs/async-io/blob/master/src/reactor.rs#L456-L461
futures-task creates vtables for original and cloned (note no #[inline(always)] on clone_arc_raw):
https://github.com/rust-lang/futures-rs/blob/master/futures-task/src/waker_ref.rs#L64
https://github.com/rust-lang/futures-rs/blob/master/futures-task/src/waker.rs#L38-L42
Patch to alloc::task in Rust that uses #[inline(always)] to avoid the split across CGUs:
https://github.com/rust-lang/rust/pull/121622
async-tungstenite can mitigate this by using std::task::Wake instead of futures_task::ArcWake or cloning the waker, both options involve extra Arc clones to work (&Arc<impl std::task::Wake> can't be directly converted into impl Deref<Target = Waker> like futures_task::waker_ref does).
futures-task can mitigate this by putting #[inline(always)] on clone_arc_raw.
futures-taskcan mitigate this by putting#[inline(always)]onclone_arc_raw.
That seems like a solution that will just fail again some time in the future. This sounds to me like will_wake() should actually only check the pointer for equality and not the vtable.
async-tungstenitecan mitigate this by usingstd::task::Wakeinstead offutures_task::ArcWakeor cloning thewaker
Doing this as a workaround for now seems acceptable. Do you want to provide a PR for that?
Intentionally keeping this open here so there's a reminder to revert the change once that's safe to do again.
https://github.com/rust-lang/futures-rs/commit/b92f4c55a3f8e9ffd2201b7a2c1827343339c92e
Tested with futures 0.3.31 and async-tungstenite =0.26.1, the problem does not seem to appear, unlike with futures =0.3.30 and async-tungstenite =0.26.1.
Thanks, I'll update the minimum version for futures with the next release and drop the workaround