kanal
kanal copied to clipboard
Oneshot: Data race detected
Problem
Continuing from #34, I audited oneshot and found a data racing problem. The quickest way to reproduce it is using MIRI.
$ cargo +nightly miri --version
miri 0.1.0 (a6f8aa5 2023-08-11)
$ cargo +nightly miri test --test=oneshot_test -- send_win_panic
Preparing a sysroot for Miri (target: x86_64-unknown-linux-gnu)... done
Finished test [optimized + debuginfo] target(s) in 0.02s
Running tests/oneshot_test.rs (/home/qc/.cache/cargo-build/miri/x86_64-unknown-linux-gnu/debug/deps/oneshot_test-c2884fce5be3f64a)
warning: Miri does not support optimizations. If you have enabled optimizations by selecting a Cargo profile (such as --release) which changes other profile settings such as whether debug assertions and overflow checks are enabled, those settings are still applied.
running 1 test
test asyncs::send_win_panic ... error: Undefined Behavior: Data race detected between (1) Read on thread `asyncs::send_wi` and (2) Write on thread `<unnamed>` at alloc33459. (2) just happened here
--> /home/qc/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ptr/mod.rs:497:1
|
497 | pub unsafe fn drop_in_place<T: ?Sized>(to_drop: *mut T) {
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Data race detected between (1) Read on thread `asyncs::send_wi` and (2) Write on thread `<unnamed>` at alloc33459. (2) just happened here
|
help: and (1) occurred earlier here
--> /home/qc/Workspace/NotMe/kanal/src/signal.rs:230:15
|
230 | match &(*this).waker {
| ^^^^^^^^^^^^^^
= help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
= help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information
= note: BACKTRACE (of the first span):
= note: inside `std::ptr::drop_in_place::<kanal::OneshotSendFuture<std::boxed::Box<usize>>> - shim(Some(kanal::OneshotSendFuture<std::boxed::Box<usize>>))` at /home/qc/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ptr/mod.rs:497:1: 497:56
= note: inside `std::ptr::drop_in_place::<std::boxed::Box<kanal::OneshotSendFuture<std::boxed::Box<usize>>>> - shim(Some(std::boxed::Box<kanal::OneshotSendFuture<std::boxed::Box<usize>>>))` at /home/qc/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ptr/mod.rs:497:1: 497:56
= note: inside `std::ptr::drop_in_place::<std::pin::Pin<std::boxed::Box<kanal::OneshotSendFuture<std::boxed::Box<usize>>>>> - shim(Some(std::pin::Pin<std::boxed::Box<kanal::OneshotSendFuture<std::boxed::Box<usize>>>>))` at /home/qc/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ptr/mod.rs:497:1: 497:56
note: inside closure
--> tests/oneshot_test.rs:298:13
|
298 | });
| ^
note: some details are omitted, run with `MIRIFLAGS=-Zmiri-backtrace=full` for a verbose backtrace
error: aborting due to previous error; 1 warning emitted
error: test failed, to rerun pass `--test oneshot_test`
Caused by:
process didn't exit successfully: `/home/qc/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/bin/cargo-miri runner /home/qc/.cache/cargo-build/miri/x86_64-unknown-linux-gnu/debug/deps/oneshot_test-c2884fce5be3f64a send_win_panic` (exit status: 1)
How to trigger it
- The sender wins the race and goes waiting.
- The receiver loses the race and tries to finish it. Now it has already set the ptr to
FINISHED
but it hasn't actually finished receiving yet. - The sender drops. It finds that the ptr has already been set to
FINISHED
, so it drops theOneshotInternal
, which contains the waker. - The receiver then finishes receiving and invokes
wake
. BANG!
Possible solution
You can add a state before FINISHED
indicating that the data is transferring. An existing example is AtomicWaker
, which has a REGISTERING
state indicating that the waker is being replaced.
This might also relate to #29.
Tnx @QuarticCat I saw it a few months ago, I didn't get the time to fix it yet, it's the only bug that is stopping Kanal from the 0.1 release. I will remove one-shot channel if I don't find a sound solution for it in an acceptable time.
@fereidani Just wanted to bump this issue and also perhaps see if you had a concrete deadline in mind for the solution here. I'm using kanal in an upcoming project and it looks awesome, and the 0.1 release would be incredible.
@Qwuke I'm going to remove oneshot from kanal and make it separate crate, I'm a bit busy with work now, oneshot needs excessive refactor and testing to be stabilize. I'll revisit as soon as I got free time.