bevy
bevy copied to clipboard
refactor: Extract parallel queue abstraction
Objective
There's a repeating pattern of ThreadLocal<Cell<Vec<T>>> which is very useful for low overhead, low contention multithreaded queues that have cropped up in a few places in the engine. This pattern is surprisingly useful when building deferred mutation across multiple threads, as noted by it's use in ParallelCommands.
However, ThreadLocal<Cell<Vec<T>>> is not only a mouthful, it's also hard to ensure the thread-local queue is replaced after it's been temporarily removed from the Cell.
Solution
Wrap the pattern into bevy_utils::Parallel<T> which codifies the entire pattern and ensures the user follows the contract. Instead of fetching indivdual cells, removing the value, mutating it, and replacing it, Parallel::get returns a ParRef<'a, T> which contains the temporarily removed value and a reference back to the cell, and will write the mutated value back to the cell upon being dropped.
I would like to use this to simplify the remaining part of #4899 that has not been adopted/merged.
Changelog
TODO
#6817 similarly abstracts the deferred mutation pattern, but seems to do so in a fully compatible way.
#6817 similarly abstracts the deferred mutation pattern, but seems to do so in a fully compatible way.
The main difference here is that this is a standalone type. It's used in check_visibility for internal parallelism, and I want to use it inside a component.
Note: Parallel::get can still be called multiple times, producing multiple aliased
ParRefs and still produce improper results, which may be a bit of a footgun.
This could be avoided by replacing the get method with a scope method that takes a impl FnOnce(&mut T) similar to how the existing ParallelCommands impl works. I personally think this would be worth the extra boilerplate, especially considering how it also lets us avoid having yet another odd smart pointer with subpar ergonomics on top of removing this footgun.
This could be avoided by replacing the
getmethod with ascopemethod that takes aimpl FnOnce(&mut T)similar to how the existing ParallelCommands impl works. I personally think this would be worth the extra boilerplate, especially considering how it also lets us avoid having yet another odd smart pointer with subpar ergonomics on top of removing this footgun.
This makes it difficult to work with several of these together, with a triple nested scope in that case. We can have both, but only having the scope is probably not 100% viable.
Fair enough, didn't consider such a case.
Moving to 0.12 as there are still open conversations.
main thing for me to approve is just to see a bench that check_visibility still performs the same.
Comparing many_cubes on main vs this PR, it does seem like we're not losing anything. There shouldn't be a performance gain of any kind from this PR, so I'm chalking up the difference to a change in execution environment between the runs. Either way, the difference isn't too big to drive suspicion here IMO. (yellow is this PR, red is main)
Ran it myself just to check, since the gap seems sort of large. The 2 runs ended up almost identical for me.