bevy icon indicating copy to clipboard operation
bevy copied to clipboard

Consolidate together Bevy's TaskPools

Open james7132 opened this issue 1 year ago • 13 comments

Objective

Fixes #1907. Spiritual successor to #4740.

Bevy currently creates a 50%/25%/25% split between the Compute, AsyncCompute, and IO task pools, meaning that any given operation can only be scheduled onto that subset of threads. This is suboptimal in the cases where apps are not using any IO or Async Compute, or vice versa, where available parallelism would be under utilized due to the split not reflecting the actual use case. This PR aims to fix that under utilization.

Solution

  • Do away with the IO and AsyncCompute task pools and allocate all of the threads to the ComputeTaskPool.
  • Move all of the non-blocking IO tasks to the Compute task pool.
  • Add TaskPool::spawn_blocking as an alternative for blocking IO operations and any task that would previously be spawned on the AsyncCompute task pool. This will be backed by blocking's dynamically scaled thread pool, which will spin up and down threads depending on the need.

This allows ECS systems and parallel iterators to be scheduled onto all CPU cores instead of artificially constraining them to half of the logical cores available, as well as for typical async IO tasks to interweave onto any thread.

The use of spawn_blocking to perform CPU-bound operations will have to rely on the OS's preemptive scheduler to properly schedule the threads when the main task pool's threads are sitting idle. This comes with potential context switching costs, but is generally preferable to choking out the entire app due the task not cooperatively yielding available threads or artificially limiting available parallelism.

Note: We're already using blocking through async-fs for loading assets from disk. This shift primarily moves all of the other blocking IO tasks, and the async compute tasks into blocking's thread pool.


Changelog

Added: TaskPool::spawn_blocking Added: TaskPool::spawn_blocking_async Removed: IoTaskPool Removed: AsyncComputeTaskPool Changed: ComputeTaskPool by default now spawns a thread for every available logical CPU core.

Migration Guide

IoTaskPool and AsyncComputeTaskPool have been removed and merged with ComputeTaskPool. Replace IoTaskPool::get and AsyncComputeTaskPool::get.

If you were spawning futures that are reliant on non-blocking IO (as in the task spends most of it's time yielding via await), just spawn the task onto the ComputeTaskPool:

// in 0.13
IoTaskPool::get().spawn(async move {
     while let Ok(item) = stream.next().await {
         // process the item here
     }
});
// in 0.14
ComputeTaskPool::get().spawn(async move {
     while let Ok(item) = stream.next().await {
         // process the item here
     }
});

If you were spawning futures that are reliant on blocking IO (i.e. std::fs::File::read_to_string), use ComputeTaskPool::spawn_blocking instead. This will spawn and/or reuse one of the dynamically scaled threads explicitly made for blocking operations.

// in 0.13
IoTaskPool::get().spawn(async move {
     let contents = File::open(path).unwrap().read_to_string().unwrap();
     // process the file contents here
});
// in 0.14
ComputeTaskPool::get().spawn_blocking(move || {
     let contents = File::open(path).unwrap().read_to_string().unwrap();
     // process the file contents here
});

If you were spawning futures for async compute, use ComputeTaskPool::spawn_blocking instead. This will spawn and/or reuse one of the dynamically scaled threads explicitly made for blocking operations.

If you were spawning futures that are reliant on blocking IO (i.e. std::fs::File::read_to_string), use ComputeTaskPool::spawn_blocking instead. This will spawn and/or reuse one of the dynamically scaled threads explicitly made for blocking operations.

// in 0.13
AsyncComputeTaskPool::get().spawn(async move {
     solve_traveling_salesman();
});
// in 0.14
ComputeTaskPool::get().spawn_blocking(move || {
     solve_traveling_salesman();
});

If you were spawning futures for long running tasks that used both blocking and non-blocking work, use ComputeTaskPool::spawn_blocking_async instead.

// in 0.13
AsyncComputeTaskPool::get().spawn(async move {
     let contents = async_fs::File::open("assets/traveling_salesman.json")
         .await
         .unwrap()
         .read_to_string()
         .await
         .unwrap();
     solve_traveling_salesman(contents);
});
// in 0.14
ComputeTaskPool::get().spawn_blocking_async(async move {
     let contents = async_fs::File::open("assets/traveling_salesman.json")
         .await
         .unwrap()
         .read_to_string()
         .await
         .unwrap();
     solve_traveling_salesman(contents);
});

james7132 avatar Feb 24 '24 13:02 james7132

There should probably be a spawn_blocking_async method too for convenience.

hymm avatar Feb 24 '24 20:02 hymm

Is there an impending fix for the wasm build failure?

Would you consider renaming ComputeTaskPool to just TaskPool or ThreadPool or such?

AxiomaticSemantics avatar Feb 25 '24 20:02 AxiomaticSemantics

Is there an impending fix for the wasm build failure?

Fixed it with spawn_blocking_async, which should defer back to normal spawn on wasm targets now.

Would you consider renaming ComputeTaskPool to just TaskPool or ThreadPool or such?

I don't want to push that kind of breakage in this PR. The focus here is to get the functionality in place.

james7132 avatar Feb 26 '24 00:02 james7132

Tried to provide a more comprehensive changelog and migration guide.

james7132 avatar Feb 27 '24 00:02 james7132

Just noting that I have previously argued for starting with the ‘allow use of all hardware threads/cores’ rather than artificially limiting them and leaving lots of performance on the table and deal with contention problems in other ways. So, I support this.

superdump avatar Feb 27 '24 08:02 superdump

Note that this seems to have a significant performance regression as is during app startup. It seems that this change blocks rendering until the entire scene is loaded. Discussion: https://discord.com/channels/691052431525675048/749335865876021248/1211935807099895888

james7132 avatar Feb 27 '24 08:02 james7132

I'm on board for this general idea provided we have the numbers to back up the move in the scenarios we care about. I'd like to see comparisons for things like "heavy scene startup", "loading big scene assets while also doing heavy / highly parallel compute tasks in systems", etc.

cart avatar Feb 27 '24 21:02 cart

Posted these traces in discord a few weeks ago, posting them here so they don't get buried.

This was loading a ~1-2GB gltf scene (I don't remember which specific one, it was either bistro with compressed textures or the synty fantasy castle scene, but both traces are for the same scene and iirc bevy main was at the time the same PR that this was branched off of).

With this PR image

Bevy main image

This PR takes significantly longer to load all the assets and get to a playable state (~7s vs ~4.7s), and doesn't let asset loading happen async with the game running.

Had to zip the traces together because github only accepts a few different file types traces.zip

Elabajaba avatar Mar 14 '24 00:03 Elabajaba

@Elabajaba with #12988 merged, it should now show the spans for IO tasks like loading/preprocessing assets. Could you try collecting another trace? If you don't have time, I'll go ahead and check against the Bistro load that I tried before.

james7132 avatar Apr 17 '24 04:04 james7132

Bistro with compressed textures image

trace: taskpoolconsolidation-IOspans-bistro.zip

Elabajaba avatar Apr 18 '24 04:04 Elabajaba

The gltf loader is still using a scope to spawn some tasks. This is necessary since it's borrowing data for some of the tasks. We probably need a scope_blocking, that spawns tasks on the blocking threads for this use case. Another possibility is to somehow throttle the amount of tasks the asset loader is allowed to spawn at once.

edit: or maybe a Scope::spawn_blocking

hymm avatar Apr 18 '24 17:04 hymm

Some care needs to be taken to ensure that blocking tasks or scopes are not used from the main task pool, or it'll be trivial to block as seen here.

james7132 avatar Apr 19 '24 01:04 james7132

@james7132 I don't think this makes sense for the 0.14 milestone in its current state. Removing, but feel free to argue with me if you disagree :)

alice-i-cecile avatar May 19 '24 00:05 alice-i-cecile