bevy
bevy copied to clipboard
Consolidate together Bevy's TaskPools
Objective
Fixes #1907. Spiritual successor to #4740.
Bevy currently creates a 50%/25%/25% split between the Compute, AsyncCompute, and IO task pools, meaning that any given operation can only be scheduled onto that subset of threads. This is suboptimal in the cases where apps are not using any IO or Async Compute, or vice versa, where available parallelism would be under utilized due to the split not reflecting the actual use case. This PR aims to fix that under utilization.
Solution
- Do away with the IO and AsyncCompute task pools and allocate all of the threads to the ComputeTaskPool.
- Move all of the non-blocking IO tasks to the Compute task pool.
- Add
TaskPool::spawn_blocking
as an alternative for blocking IO operations and any task that would previously be spawned on the AsyncCompute task pool. This will be backed byblocking
's dynamically scaled thread pool, which will spin up and down threads depending on the need.
This allows ECS systems and parallel iterators to be scheduled onto all CPU cores instead of artificially constraining them to half of the logical cores available, as well as for typical async IO tasks to interweave onto any thread.
The use of spawn_blocking
to perform CPU-bound operations will have to rely on the OS's preemptive scheduler to properly schedule the threads when the main task pool's threads are sitting idle. This comes with potential context switching costs, but is generally preferable to choking out the entire app due the task not cooperatively yielding available threads or artificially limiting available parallelism.
Note: We're already using blocking
through async-fs
for loading assets from disk. This shift primarily moves all of the other blocking IO tasks, and the async compute tasks into blocking
's thread pool.
Changelog
Added: TaskPool::spawn_blocking
Added: TaskPool::spawn_blocking_async
Removed: IoTaskPool
Removed: AsyncComputeTaskPool
Changed: ComputeTaskPool
by default now spawns a thread for every available logical CPU core.
Migration Guide
IoTaskPool
and AsyncComputeTaskPool
have been removed and merged with ComputeTaskPool
. Replace IoTaskPool::get
and AsyncComputeTaskPool::get
.
If you were spawning futures that are reliant on non-blocking IO (as in the task spends most of it's time yielding via await
), just spawn the task onto the ComputeTaskPool
:
// in 0.13
IoTaskPool::get().spawn(async move {
while let Ok(item) = stream.next().await {
// process the item here
}
});
// in 0.14
ComputeTaskPool::get().spawn(async move {
while let Ok(item) = stream.next().await {
// process the item here
}
});
If you were spawning futures that are reliant on blocking IO (i.e. std::fs::File::read_to_string
), use ComputeTaskPool::spawn_blocking
instead. This will spawn and/or reuse one of the dynamically scaled threads explicitly made for blocking operations.
// in 0.13
IoTaskPool::get().spawn(async move {
let contents = File::open(path).unwrap().read_to_string().unwrap();
// process the file contents here
});
// in 0.14
ComputeTaskPool::get().spawn_blocking(move || {
let contents = File::open(path).unwrap().read_to_string().unwrap();
// process the file contents here
});
If you were spawning futures for async compute, use ComputeTaskPool::spawn_blocking
instead. This will spawn and/or reuse one of the dynamically scaled threads explicitly made for blocking operations.
If you were spawning futures that are reliant on blocking IO (i.e. std::fs::File::read_to_string
), use ComputeTaskPool::spawn_blocking
instead. This will spawn and/or reuse one of the dynamically scaled threads explicitly made for blocking operations.
// in 0.13
AsyncComputeTaskPool::get().spawn(async move {
solve_traveling_salesman();
});
// in 0.14
ComputeTaskPool::get().spawn_blocking(move || {
solve_traveling_salesman();
});
If you were spawning futures for long running tasks that used both blocking and non-blocking work, use ComputeTaskPool::spawn_blocking_async
instead.
// in 0.13
AsyncComputeTaskPool::get().spawn(async move {
let contents = async_fs::File::open("assets/traveling_salesman.json")
.await
.unwrap()
.read_to_string()
.await
.unwrap();
solve_traveling_salesman(contents);
});
// in 0.14
ComputeTaskPool::get().spawn_blocking_async(async move {
let contents = async_fs::File::open("assets/traveling_salesman.json")
.await
.unwrap()
.read_to_string()
.await
.unwrap();
solve_traveling_salesman(contents);
});
There should probably be a spawn_blocking_async method too for convenience.
Is there an impending fix for the wasm build failure?
Would you consider renaming ComputeTaskPool
to just TaskPool
or ThreadPool
or such?
Is there an impending fix for the wasm build failure?
Fixed it with spawn_blocking_async
, which should defer back to normal spawn
on wasm targets now.
Would you consider renaming
ComputeTaskPool
to justTaskPool
orThreadPool
or such?
I don't want to push that kind of breakage in this PR. The focus here is to get the functionality in place.
Tried to provide a more comprehensive changelog and migration guide.
Just noting that I have previously argued for starting with the ‘allow use of all hardware threads/cores’ rather than artificially limiting them and leaving lots of performance on the table and deal with contention problems in other ways. So, I support this.
Note that this seems to have a significant performance regression as is during app startup. It seems that this change blocks rendering until the entire scene is loaded. Discussion: https://discord.com/channels/691052431525675048/749335865876021248/1211935807099895888
I'm on board for this general idea provided we have the numbers to back up the move in the scenarios we care about. I'd like to see comparisons for things like "heavy scene startup", "loading big scene assets while also doing heavy / highly parallel compute tasks in systems", etc.
Posted these traces in discord a few weeks ago, posting them here so they don't get buried.
This was loading a ~1-2GB gltf scene (I don't remember which specific one, it was either bistro with compressed textures or the synty fantasy castle scene, but both traces are for the same scene and iirc bevy main was at the time the same PR that this was branched off of).
With this PR
Bevy main
This PR takes significantly longer to load all the assets and get to a playable state (~7s vs ~4.7s), and doesn't let asset loading happen async with the game running.
Had to zip the traces together because github only accepts a few different file types traces.zip
@Elabajaba with #12988 merged, it should now show the spans for IO tasks like loading/preprocessing assets. Could you try collecting another trace? If you don't have time, I'll go ahead and check against the Bistro load that I tried before.
The gltf loader is still using a scope to spawn some tasks. This is necessary since it's borrowing data for some of the tasks. We probably need a scope_blocking, that spawns tasks on the blocking threads for this use case. Another possibility is to somehow throttle the amount of tasks the asset loader is allowed to spawn at once.
edit: or maybe a Scope::spawn_blocking
Some care needs to be taken to ensure that blocking tasks or scopes are not used from the main task pool, or it'll be trivial to block as seen here.
@james7132 I don't think this makes sense for the 0.14 milestone in its current state. Removing, but feel free to argue with me if you disagree :)