macroquad icon indicating copy to clipboard operation
macroquad copied to clipboard

`fn get_context() -> &'static mut` is unsound

Open parasyte opened this issue 3 years ago • 5 comments

This issue was brought to my attention in https://www.reddit.com/r/rust/comments/rcb8wg/are_static_mut_really_that_bad/ where a commenter (now deleted by the author) used Macroquad's usage of static mut as an example of how it can be used without worry.

https://github.com/not-fl3/macroquad/blob/33a8e4a3eca7c323ab09042c35a6b056bb74b773/src/lib.rs#L414-L416

After looking over the code for a few minutes, it became apparent that the use of static mut here is most certainly unsound, in at least this case: https://github.com/not-fl3/macroquad/blob/33a8e4a3eca7c323ab09042c35a6b056bb74b773/src/lib.rs#L503-L533

The issue is that touch_event acquires a unique borrow of globally accessible state, holding the reference across a function call boundary that itself also acquires the same unique borrow. I haven't been able to write a test that calls the touch_event method, so I cannot provide proof of unsoundness with a test. Instead, I wrote a minimal test case based on the structure of code that illustrates the problem: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=c94caef0ca7aa1aa57c26ed81326a521

Miri output for the minimized test case:

   Compiling playground v0.0.1 (/playground)
    Finished dev [unoptimized + debuginfo] target(s) in 0.87s
     Running `/playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/bin/cargo-miri target/miri/x86_64-unknown-linux-gnu/debug/playground`
error: Undefined Behavior: no item granting read access to tag <2025> at alloc1+0x8 found in borrow stack.
  --> src/main.rs:27:5
   |
27 |     context.0 += 1;
   |     ^^^^^^^^^^^^^^ no item granting read access to tag <2025> at alloc1+0x8 found in borrow stack.
   |
   = help: this indicates a potential bug in the program: it performed an invalid operation, but the rules it violated are still experimental
   = help: see https://github.com/rust-lang/unsafe-code-guidelines/blob/master/wip/stacked-borrows.md for further information
           
   = note: inside `touch_event` at src/main.rs:27:5
note: inside `main` at src/main.rs:34:5
  --> src/main.rs:34:5
   |
34 |     touch_event();
   |     ^^^^^^^^^^^^^
   = note: inside `<fn() as std::ops::FnOnce<()>>::call_once - shim(fn())` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:227:5
   = note: inside `std::sys_common::backtrace::__rust_begin_short_backtrace::<fn(), ()>` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/sys_common/backtrace.rs:123:18
   = note: inside closure at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/rt.rs:145:18
   = note: inside `std::ops::function::impls::<impl std::ops::FnOnce<()> for &dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe>::call_once` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:259:13
   = note: inside `std::panicking::r#try::do_call::<&dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe, i32>` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:406:40
   = note: inside `std::panicking::r#try::<i32, &dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe>` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:370:19
   = note: inside `std::panic::catch_unwind::<&dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe, i32>` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panic.rs:133:14
   = note: inside closure at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/rt.rs:128:48
   = note: inside `std::panicking::r#try::do_call::<[closure@std::rt::lang_start_internal::{closure#2}], isize>` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:406:40
   = note: inside `std::panicking::r#try::<isize, [closure@std::rt::lang_start_internal::{closure#2}]>` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:370:19
   = note: inside `std::panic::catch_unwind::<[closure@std::rt::lang_start_internal::{closure#2}], isize>` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panic.rs:133:14
   = note: inside `std::rt::lang_start_internal` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/rt.rs:128:20
   = note: inside `std::rt::lang_start::<()>` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/rt.rs:144:17

error: aborting due to previous error

I had initially set out to send a PR to fix this, but as I dug into it more issues emerged. To start, I made get_context() unsafe, because the caller is responsible for guaranteeing that it will not hold more than one reference to the returned &'static mut.

Secondly, the touch_event method was updated to explicitly drop context and then reacquire it after the self.mouse_* calls. This worked well enough.

Third, as I was spreading unsafe calls through the rest of the codebase, I came upon this: https://github.com/not-fl3/macroquad/blob/33a8e4a3eca7c323ab09042c35a6b056bb74b773/src/experimental/coroutines.rs#L89-L107

Which, for the life of me, I cannot find a way to make sound with an explicit drop. At least not without a lot of additional effort. My hypothesis is that resume(&mut f.future) may end up calling Coroutine::poll() in a reentrant manner. This would have the same effect as the issue seen in touch_event; the unique borrow is held across a function call boundary which may also acquire the same unique borrow.

I could be wrong and maybe this condition is not possible. It deserves some extra attention by people who know this code better than I do.

Using the implicit drop behavior of the compiler does work for this case, as far as I can tell (reacquire the unique borrow inside the if body and reacquire again afterward). But I am honestly much less confident of this approach. And logically I still can't believe that reentrancy here would be sound.


Moving on, I found this: https://github.com/not-fl3/macroquad/blob/33a8e4a3eca7c323ab09042c35a6b056bb74b773/src/ui.rs#L38-L40

Sorry, but I don't know what to do in this case. root_ui needs to be made unsafe as long as get_context() returns &'static mut.


That's where I stopped. At this point, I believe get_context() should instead return a shared borrow, and any mutability needs should rely on interior mutability with UnsafeCell or other primitives built on it.

parasyte avatar Dec 09 '21 19:12 parasyte

yes, it is unsound :(

there are two ways this unsoundess can actually create bugs - multithreaded access and aliasing optimizations.

On wasm we do not really have threads, so macroquad is designed to be used from one thread. There should be some runtime checks to protect from using the API from multiple threads tho.

And for aliasing - I hope I'll refactor all this before aliasing optimization will be too aggressive and will actually create some bugs.

not-fl3 avatar Dec 10 '21 13:12 not-fl3

I was going to file an issue but saw this issue was already filed. :smile:

I'm using macroquad for a personal project (thanks for a great project by the way!) and was looking at the source. I saw the use of the static mut and handing mutable references out which seemed like UB.

Like @parasyte, I also wrote a sample program that simulates the raw_miniquad example program: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=c83dce8cb9ca45d1b1d2daab0ea2615b

https://github.com/not-fl3/macroquad/blob/1dcc721dfec8817584b191e7a0c5942c6b6fe029/examples/raw_miniquad.rs#L27-L34

To avoid UB, perhaps we could use an UnsafeCell instead? We would work with mutable pointers instead of mutable references, which allow aliasing.

tmfink avatar Jan 08 '22 04:01 tmfink

get_internal_gl is unsafe for that exact reason - it is really easy to create an aliasing pointer with incorrect usage

not-fl3 avatar Jan 08 '22 05:01 not-fl3

I think I have a way forward. We can use the RacyCell idiom described in a proposal to deprecate static mut.

Playground link https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=c22ec400b8fbdf72a0ef5808106b818b

We need to use pointers instead of mutable references. This will require more use of unsafe keyword, but that's probably better (reminder to uphold the invariants).

I'll try to put a PR together.

tmfink avatar Jan 09 '22 03:01 tmfink

Per my comment in #368, it would likely take a lot of work to fix the underlying soundness issues in macroquad.

There are also other static mut variables that likely introduce similar issues:

$ rg '^static mut'
src/lib.rs
404:static mut CONTEXT: Option<Context> = None;
418:static mut MAIN_FUTURE: Option<Pin<Box<dyn Future<Output = ()>>>> = None;

src/experimental/scene.rs
612:static mut SCENE: Option<Scene> = None;

src/experimental/collections/storage.rs
26:static mut STORAGE: Option<HashMap<TypeId, Box<dyn Any>>> = None;

src/telemetry.rs
5:static mut PROFILER: Option<Profiler> = None;

Fixing all of these issues may even require ch

tmfink avatar Jan 14 '22 07:01 tmfink