bevy icon indicating copy to clipboard operation
bevy copied to clipboard

refactor: Extract parallel queue abstraction

Open james7132 opened this issue 2 years ago • 7 comments

Objective

There's a repeating pattern of ThreadLocal<Cell<Vec<T>>> which is very useful for low overhead, low contention multithreaded queues that have cropped up in a few places in the engine. This pattern is surprisingly useful when building deferred mutation across multiple threads, as noted by it's use in ParallelCommands.

However, ThreadLocal<Cell<Vec<T>>> is not only a mouthful, it's also hard to ensure the thread-local queue is replaced after it's been temporarily removed from the Cell.

Solution

Wrap the pattern into bevy_utils::Parallel<T> which codifies the entire pattern and ensures the user follows the contract. Instead of fetching indivdual cells, removing the value, mutating it, and replacing it, Parallel::get returns a ParRef<'a, T> which contains the temporarily removed value and a reference back to the cell, and will write the mutated value back to the cell upon being dropped.

I would like to use this to simplify the remaining part of #4899 that has not been adopted/merged.


Changelog

TODO

james7132 avatar Jan 24 '23 01:01 james7132

#6817 similarly abstracts the deferred mutation pattern, but seems to do so in a fully compatible way.

alice-i-cecile avatar Jan 24 '23 14:01 alice-i-cecile

#6817 similarly abstracts the deferred mutation pattern, but seems to do so in a fully compatible way.

The main difference here is that this is a standalone type. It's used in check_visibility for internal parallelism, and I want to use it inside a component.

james7132 avatar Jan 24 '23 17:01 james7132

Note: Parallel::get can still be called multiple times, producing multiple aliased ParRefs and still produce improper results, which may be a bit of a footgun.

This could be avoided by replacing the get method with a scope method that takes a impl FnOnce(&mut T) similar to how the existing ParallelCommands impl works. I personally think this would be worth the extra boilerplate, especially considering how it also lets us avoid having yet another odd smart pointer with subpar ergonomics on top of removing this footgun.

TheRawMeatball avatar Jan 24 '23 18:01 TheRawMeatball

This could be avoided by replacing the get method with a scope method that takes a impl FnOnce(&mut T) similar to how the existing ParallelCommands impl works. I personally think this would be worth the extra boilerplate, especially considering how it also lets us avoid having yet another odd smart pointer with subpar ergonomics on top of removing this footgun.

This makes it difficult to work with several of these together, with a triple nested scope in that case. We can have both, but only having the scope is probably not 100% viable.

james7132 avatar Jan 24 '23 18:01 james7132

Fair enough, didn't consider such a case.

TheRawMeatball avatar Jan 24 '23 18:01 TheRawMeatball

Moving to 0.12 as there are still open conversations.

cart avatar Jul 07 '23 23:07 cart

main thing for me to approve is just to see a bench that check_visibility still performs the same.

Comparing many_cubes on main vs this PR, it does seem like we're not losing anything. There shouldn't be a performance gain of any kind from this PR, so I'm chalking up the difference to a change in execution environment between the runs. Either way, the difference isn't too big to drive suspicion here IMO. (yellow is this PR, red is main)

image

james7132 avatar Feb 12 '24 23:02 james7132

Ran it myself just to check, since the gap seems sort of large. The 2 runs ended up almost identical for me. image

hymm avatar Feb 14 '24 04:02 hymm