thread_local-rs icon indicating copy to clipboard operation
thread_local-rs copied to clipboard

A little soundness hole around drop order of `thread_local!(static …)`

Open steffahn opened this issue 10 months ago • 5 comments

Here's the code to reproduce:

use std::{cell::{Cell, RefCell}, thread};

use thread_local::ThreadLocal;

fn main() {
    static FOO: ThreadLocal<Cell<i32>> = ThreadLocal::new();

    // thread 1
    thread::spawn(|| {
        thread_local!(
            static ACCESSOR: RefCell<AccessFooFromNewThreadOnDrop> =
                RefCell::new(AccessFooFromNewThreadOnDrop(None));
        );
        struct AccessFooFromNewThreadOnDrop(Option<&'static Cell<i32>>);

        // FOO.get_or_default(); // uncomment this if needed, for opposite initialization order
                                 // of the thread-locals `ACCESSOR` and “whatever ThreadLocal
                                 // uses internally” [I suppose `THREAD_GUARD: ThreadGuard`]
                                 // (drop order seems different on miri vs Linux)

        ACCESSOR.with(|a| { // initialize ACCESSOR in thread 1, will be dropped at end of scope below
            a.borrow_mut().0 = Some(FOO.get_or_default()); // store a borrow of thread-local FOO value
        });

        // destructor from `thread_local` library [I suppose `THREAD_GUARD: ThreadGuard`] runs first,
        // making FOO's value in thread 1, referenced in ACCESSOR, available for re-use in new threads

        impl Drop for AccessFooFromNewThreadOnDrop {
            fn drop(&mut self) {
                // thread 2
                let t = thread::spawn(|| {
                    // succeeds, re-using Cell<i32> value from thread 1
                    let r1: &Cell<i32> = FOO.get().unwrap();
                    r1.set(42); // write-write data race, UB
                });
                let r2: &Cell<i32> = self.0.unwrap();
                r2.set(1337); // write-write data race, UB

                t.join().unwrap();
            }
        }
    })
    .join()
    .unwrap();
}

(run on rustexplorer.com [where you'll just see it compiles&runs and doesn't panic; I haven’t made the data race particularly “observable” at run-time, to keep the code more clean])

With the additional FOO.get_or_default(); line uncommented as indicated, for panic-free execution on miri, this gets reported UB as expected:

error: Undefined Behavior: Data race detected between (1) non-atomic write on thread `unnamed-1` and (2) retag write of type `i32` on thread `unnamed-2` at alloc3418. (2) just happened here
   --> /home/frank/.rustup/toolchains/nightly-aarch64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/cell.rs:482:31
    |
482 |         mem::replace(unsafe { &mut *self.value.get() }, val)
    |                               ^^^^^^^^^^^^^^^^^^^^^^ Data race detected between (1) non-atomic write on thread `unnamed-1` and (2) retag write of type `i32` on thread `unnamed-2` at alloc3418. (2) just happened here
    |
help: and (1) occurred earlier here
   --> src/main.rs:37:17
    |
37  |                 r2.set(1337); // write-write data race, UB
    |                 ^^^^^^^^^^^^
    = help: retags occur on all (re)borrows and as well as when references are copied or moved
    = help: retags permit optimizations that insert speculative reads or writes
    = help: therefore from the perspective of data races, a retag has the same implications as a read or write
    = help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
    = help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information
    = note: BACKTRACE (of the first span) on thread `unnamed-2`:
    = note: inside `std::cell::Cell::<i32>::replace` at /home/frank/.rustup/toolchains/nightly-aarch64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/cell.rs:482:31: 482:53
    = note: inside `std::cell::Cell::<i32>::set` at /home/frank/.rustup/toolchains/nightly-aarch64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/cell.rs:411:9: 411:26
note: inside closure
   --> src/main.rs:34:21
    |
34  |                     r1.set(42); // write-write data race, UB
    |                     ^^^^^^^^^^

The issue here is the reasoning behind the T: Send ==> ThreadLocal<T>: Sync logic not being 100% watertight. As stated in the docs:

Note that since thread IDs are recycled when a thread exits, it is possible for one thread to retrieve the object of another thread. Since this can only occur after a thread has exited this does not lead to any race conditions.

but the “can only occur after a thread has exited” part is enforced via a thread-local (the THREAD_GUARD: ThreadGuard one if I'm not mistaken). After that gets dropped, destructors of other thread_local!(static …); items can still run. Because ThreadLocal also doesn’t come with a .with(|x| …)-style API, borrows of 'static lifetime can be created and thus a borrow of a ThreadLocal’s contents can be made available to such destructors after the value has already been marked as re-usable by other threads via the ThreadGuard dropped beforehand.

I can’t really think of any particularly satisfying alternative workarounds in implementation, i.e. without significant API-change: Technically, a with(|x| …)-style API should probably “fix” this. The get()-style API can also be kept in addition, as long as the contained type is Sync (because unlike std::thread_local!, this ThreadLocal does not suffer from issues of the contents being dropped too early, when accessed during drops of thread locals).

steffahn avatar Apr 18 '24 20:04 steffahn

For putting this into perspective; going as far as spawning a thread in the destructor of a thread-local is not necessary for unsoundness. The following simpler code too reliably reports UB on current miri (the thread::sleep is necessary to make miri execute this in a way that it finds the UB, but it should be a possible data race even without it):

use std::{cell::Cell, thread, time::Duration};
use thread_local::ThreadLocal;

static FOO: ThreadLocal<Cell<i32>> = ThreadLocal::new();

struct WriteOnDrop(&'static Cell<i32>);
thread_local!(static W: Cell<WriteOnDrop> = unreachable!());
impl Drop for WriteOnDrop {
    fn drop(&mut self) {
        self.0.set(0);
    }
}

fn main() {
    let _ = thread::spawn(|| W.set(WriteOnDrop(FOO.get_or_default())));
    thread::sleep(Duration::from_millis(100));
    FOO.get().unwrap().set(0);
}

steffahn avatar Apr 18 '24 21:04 steffahn

Hmm, I would have expected the lock on THREAD_ID_MANAGER to act as a synchronization point which should avoid races. Specifically:

  • When ThreadGuard is dropped, it locks THREAD_ID_MANAGER and releases the thread id.
  • When thread_id::get is called on the main thread for the first time, it will lock THREAD_ID_MANAGER and get the same thread id that the child thread used.

Amanieu avatar Apr 25 '24 22:04 Amanieu

Ah I see the problem now. The issue is that the lifetime returned by get() can outlive the thread ID.

You're right, I don't really see any alternative other than a .with-style API. This is unfortunate since it would require a new major release considering the API change.

Amanieu avatar Apr 25 '24 22:04 Amanieu

The get()-style API can also be kept in addition, as long as the contained type is Sync

I would like to put emphasis on this, as get() is necessary for an API I have designed using ThreadLocal, and I use T: Sync+Send.

To alleviate the need for a breaking change, I would suggest introducing with(), deprecating get(), and offering alternatives for get() for when T: Send+Sync. I'm not sure on a good naming scheme for the alternatives, however

conradludgate avatar Apr 29 '24 11:04 conradludgate

I think get_sync with a T: Sync bound is sufficiently explicit. And we can deprecate the existing get API. That neatly avoids the need for a major release.

Amanieu avatar Apr 29 '24 12:04 Amanieu