Split Rayon thread pool; maybe work on background tasks in general

Open kpreid opened this issue 3 months ago • 1 comments

Certain parts of All is Cubes currently use rayon’s global thread pool for parallel computation. However, it is possible for the thread pool to be occupied by a lengthy computation (such as the parallelism I just added to importing in b0c2c42bacbd5ea937833f58b9c2798d3c15913b), and if this happens, it will delay anything else (such as per-frame rendering work) done in parallel. This can be mitigated to an extent with rayon::yield_now() in the long tasks, but that is not guaranteed to get the desired results.

To address this, we should have a separate thread pool for all per-step or per-frame parallel work, so that clients of that thread pool can agree on a principle like “nothing longer than a millisecond under normal conditions” and cooperate. This has some design considerations:

This thread pool will either need to be passed around, live in a static we define, or install()ed at every call site. It would be natural to pass it around as part of the Executor trait, but if we do that, everything that is currently using the global thread pool under the auto-threads feature will need to take an Executor parameter, and will no longer be actually “auto”. Also, the trait will need to have conditionally present methods, but that should not be a additivity problem provided they have default bodies (that return None, presumably).
What should the sizes of the thread pools be? Anything less than available_parallelism limits throughput, but having 2 pools of that size potentially causes a lot of contention.

Ideally the “long” tasks would be softly de-prioritized relative to the “short” tasks, but std doesn’t provide a way to control thread priority, and rayon doesn’t let us influence the scheduling of tasks, nor does it have any intrinsic possibility of pausing an already-running task anyway. We could inject dummy tasks that do nothing, but that is not a quick-responding mechanism.

We should also consider the bigger picture of how the existing functions of Executor work with this, and how callers (not implementors) are expected to use it. Right now, we:

use Rayon for synchronous, short tasks during rendering and simulation
use Executor::spawn_background() for asynchronous cooperative tasks, that are expected to yield regularly and may also suspend (MeshJobQueue, in particular)
newly use Rayon for synchronous, long tasks during import

This isn't very a coherent architecture. It looks to me like 2 and 3 might be able to use the same thread pool, since e.g. the mesh job queue doesn't specifically need to be reliably consumed — if we can manage to reframe spawn_background() from "spawn some actorish tasks that listen for work" to "spawn individual jobs into Rayon or something else" so that we're not doing any blocking inside the Rayon thread pool. But that might be itself awkward.

Sep 10 '25 22:09 kpreid

Since bevy_tasks is in our dependency graph, we should consider using its compute task pool functionality, but I have not looked at how well it fits in, such as how it behaves on wasm targets.

Sep 17 '25 00:09 kpreid