bevy Take non-`Send` resources out of `World`

This PR builds on top of #9202 (I separated what could be separated from this PR into that one).

Objective

The aim of this PR is to be the "final" fix to our messy problems involving thread-local data (see https://github.com/bevyengine/bevy/discussions/6552 for context).

Bevy only deals with a handful of thread-local resources (that mostly come from low-level OS APIs or libs that talk to those), but that handful has resulted in a ton of internal complexity (in bevy_ecs, bevy_tasks, and bevy_app). The way we currently store them makes it either difficult or impossible to implement some nice-to-have features (e.g. generalized multi-world apps, framerate-independent input handling, etc.).

Solution

tl;dr Separate non-Send data from the ECS entirely. It will remain indirectly accessible through an event loop running on the main thread. Then, move app updates into a separate thread (on permitting targets) so they don't block the event loop.

This setup should be familiar to any audio plugin developers (or async runtime developers). If a system on thread B needs to do something with data that can't leave thread A, it can just send a command to A's event loop and wait for A to run it (and send any result back).

Some unique benefits of separating Send and !Send data like this are:

Res and ResMut can be removed in favor of Ref and Mut since they're redundant without NonSend and NonSendMut.
World becomes unambiguously Send, so a world can be dropped on any thread.
The multi-threaded executor is able to run any system on any thread (including exclusive systems).
The multi-threaded executor is able to run any system without spawning a task for it (especially exclusive systems).
Any system from any world on any thread can remotely access the local resources in the main thread.
Input events could be timestamped (with Instant::now()) with virtually no delay/aliasing (the event loop isn't blocked).

Changelog

TODO: finish this section

NonSend and NonSendMut have been removed.
Moved storage of non-Send resources to an external ThreadLocals container. It's normally owned by the App.

Migration Guide

If you were doing this before:

fn foo(x: NonSend<X>) {
    /* ... */
}

you're doing this now:

fn foo(mut thread: ThreadLocal) {
    // This closure will run on the thread that owns the TLS.
    // `run` blocks until the closure is completed.
    thread.run(|tls| {
        let x = tls.resource::<X>();
        /* ... */
    });
}

For better or worse, the handling that used to be buried inside the scheduler is now the responsibility of whoever writes an App runner. It's not too much boilerplate though IMO. Here's an example of a basic runner that runs the app once.

fn run_once(mut app: App) {
    let (mut sub_apps, tls, send, recv, _runner) = app.into_parts();
          
    // Move sub-apps to another thread.
    let thread = std::thread::spawn(move || {
        let result = std::panic::catch_unwind(std::panic::AssertUnwindSafe(|| {
            sub_apps.update();
            send.send(AppEvent::Exit(sub_apps)).unwrap();
        }));
        
        if let Some(payload) = result.err() {
            send.send(AppEvent::Error(payload)).unwrap();
        }
    });
    
    // Run an event loop in this thread.
    loop {
        let event = recv.recv().unwrap();
        match event {
            AppEvent::RunTask(task) => {
                task();
            }
            AppEvent::Exit(_) => {
                thread.join().unwrap();
                break;
            }
            AppEvent::Error(payload) => {
                std::panic::resume_unwind(payload);
            }
        }
    }
}

Spawning a thread like this is basically mandatory unless you're targeting an environment (i.e. wasm32) or are enabling a bevy compiler flag that forces completely single-threaded system execution. Otherwise, not doing so will risk running into a deadlock.

~~Unfortunately, the channel between the main and app threads can't always be a std::sync::mpsc::channel. The winit runner spawns an event loop that can only be woken up by winit events, so we have to use their special EventLoopProxy sender. That's the reason for the ThreadLocalTaskSender trait.~~ I still had to box the channel as a trait object because bevy_ecs and bevy_app are separated.

Jul 12 '23 00:07 maniwani

I think we should swap this to the 0.13 milestone now. We're drawing close and this is still in draft.

Sep 19 '23 23:09 alice-i-cecile

Let me see if I can have this ready for review this weekend. I've already rebased (just haven't pushed) and only the winit runner is unfinished (winit runner code is the bulk of this PR).

Sep 19 '23 23:09 maniwani

I will move this out of draft soon. There is at least one thing to debug still. (The game examples crash after a couple seconds.)

Decided to push this to the next milestone after all since it'll need some thorough testing. It'll also be easier to review after #8745 and #9202 are merged. (I'd still like to see this merged as soon as possible since it's blocking other important things.)

Sep 24 '23 02:09 maniwani

Your PR increases Bevy Minimum Supported Rust Version. Please update the rust-version field in the root Cargo.toml file.

Oct 08 '23 20:10 github-actions[bot]

Your PR increases Bevy Minimum Supported Rust Version. Please update the rust-version field in the root Cargo.toml file.

Oct 08 '23 20:10 github-actions[bot]

Leaving a note to myself. #9826 made it possible for the render thread to spawn before the runner is called, so we can't add the ThreadLocalChannel resource to the RenderApp in winit_runner. We need ThreadLocalChannel to implement Clone so the main world can add it to the render world before/during extraction.

Oct 19 '23 17:10 maniwani

Could spawning a thread wrapping the EventLoopProxy within the winit plugin, and then sending messages through a normal channel from there work?

*edited for tone - after initial posting I re-read it and it felt a bit aggressive/patronizing. This is an edit to clarify this as a genuine question.

Oct 22 '23 18:10 lee-orr

Could spawning a thread wrapping the EventLoopProxy within the winit plugin, and then sending messages through a normal channel from there work?

Are you suggesting something like this?

// create a standard rust channel
let (send, recv) = std::sync::mpsc::channel();

// create a winit proxy
let proxy = event_loop.create_proxy();

// move it into a new thread (so there would be thread "main", thread "app", and now this)
std::thread::Builder::new()
    .name("winit-proxy-listener".into())
    .spawn(move ||
        while let Ok(event) = recv.recv() {
            // got something from the app, forward it to the event loop
            proxy.send(event).unwrap();
        }
    )
    .unwrap();

And were you specifically addressing my last comment or what I wrote in the description about the ThreadLocalTaskSender? (Or both?) It does seem like an option to address both problems at once, so I do like your idea. It feels a little overkill to create a thread just to forward messages though. I wonder if the overhead from just having threads is negligible.

Oct 22 '23 19:10 maniwani

Also, side note, ThreadLocalChannel should implement Clone anyway. Once this PR is merged, I want to reorder the app construction process so that building plugins is deferred[^1] and only happens after the runner starts, so the channel will already exist when the RenderApp is created. We'd clone it and add it to the render world then.

[^1]: The only thing a plugin would be able to do immediately on import is change the runner.

Oct 22 '23 19:10 maniwani

That was roughly what I was thinking, yes.

It was a reply to both, plus a question since I'm dealing with some thread-related things in the context of hot reload and was curious if there was a reason it wasn't possible...

Although given that concern - maybe it could be a specialized thing that happens in the app thread as a task? The main thought is wrapping it up somehow to allow the communication into the world to rely purely on standard channels.

Oct 22 '23 19:10 lee-orr

Also, side note, ThreadLocalChannel should implement Clone anyway. Once this PR is merged, I want to reorder the app construction process so that building plugins is deferred1 and only happens after the runner starts, so the channel will already exist when the RenderApp is created. We'd clone it and add it to the render world then.

This actually brings up a question I've been mulling over - whether it's worth having two distinct plugin types: "Frame" plugins (that require non-send/sync elements and can't be disposed and re-created easily, like winit) and normal plugins (where either it's fully send/sync, or the elements that are non-send/sync are just optimizations/caching/etc and can be thrown away and re-created as needed - for example having a thread-local RNG instance)

I'm coming at this from the perspective of working on hot-reload, where having a fully separate "frame" would allow for much easier reload of the internals.

Oct 22 '23 19:10 lee-orr

The main thought is wrapping it up somehow to allow the communication into the world to rely purely on standard channels.

Yeah, I think I'll try incorporating this idea. I am pretty unsatisfied with the ThreadLocalTaskSender trait, especially since it's just a byproduct of trying to interface with winit. It makes sense for WinitPlugin to own that complexity.

Oct 22 '23 19:10 maniwani

Is there an existing discussion related to plugin initialization and stuff? Somewhere where the "frame" idea either was discussed or could be raised effectively? (I don't want to pollute this PR with that)

Oct 22 '23 19:10 lee-orr

This actually brings up a question I've been mulling over - whether it's worth having two distinct plugin types: "Frame" plugins (that require non-send/sync elements and can't be disposed and re-created easily, like winit) and normal plugins (where either it's fully send/sync, or the elements that are non-send/sync are just optimizations/caching/etc and can be thrown away and re-created as needed - for example having a thread-local RNG instance).

I don't think we need this. The only thing in a plugin that "can't wait" is setting the runner function. The WinitPlugin only creates the EventLoop during its build method because the RenderPlugin needs a window to initialize the renderer on some targets.

If we defer plugins being built, then the EventLoop can just be created by the runner function. It won't need to be stored in TLS.

Is there an existing discussion related to plugin initialization and stuff? Somewhere where the "frame" idea either was discussed or could be raised effectively? (I don't want to pollute this PR with that)

I don't know of anything recent (edit: assuming you meant GitHub discussion), so feel free to start one.

(edit 2) I do have some ideas (from flecs) about using entities (specifically a parent-child hierarchy) to automate a clean removal process for plugins, but wasn't gonna get into that until we implement relationships.

Oct 22 '23 19:10 maniwani

Thanks for the quick reply! I agree that it's not needed for this PR to work well - I think it's something that might be useful for other elements (like the editor, embedding in non-bevy apps, and hot reload). I'll create a new discussion for that, but regardless am excited for this PR!

Oct 22 '23 19:10 lee-orr

If we defer plugins being built

For context, I really want to do this anyways to address #1255.

Oct 22 '23 23:10 alice-i-cecile