Objective

There's a repeating pattern of ThreadLocal<Cell<Vec<T>>> which is very useful for low overhead, low contention multithreaded queues that have cropped up in a few places in the engine. This pattern is surprisingly useful when building deferred mutation across multiple threads, as noted by it's use in ParallelCommands.

However, ThreadLocal<Cell<Vec<T>>> is not only a mouthful, it's also hard to ensure the thread-local queue is replaced after it's been temporarily removed from the Cell.

Solution

Wrap the pattern into bevy_utils::Parallel<T> which codifies the entire pattern and ensures the user follows the contract. Instead of fetching indivdual cells, removing the value, mutating it, and replacing it, Parallel::get returns a ParRef<'a, T> which contains the temporarily removed value and a reference back to the cell, and will write the mutated value back to the cell upon being dropped.

I would like to use this to simplify the remaining part of #4899 that has not been adopted/merged.

Changelog

TODO

Jan 24 '23 01:01 james7132

#6817 similarly abstracts the deferred mutation pattern, but seems to do so in a fully compatible way.

Jan 24 '23 14:01 alice-i-cecile

#6817 similarly abstracts the deferred mutation pattern, but seems to do so in a fully compatible way.

The main difference here is that this is a standalone type. It's used in check_visibility for internal parallelism, and I want to use it inside a component.

Jan 24 '23 17:01 james7132

Note: Parallel::get can still be called multiple times, producing multiple aliased ParRefs and still produce improper results, which may be a bit of a footgun.

This could be avoided by replacing the get method with a scope method that takes a impl FnOnce(&mut T) similar to how the existing ParallelCommands impl works. I personally think this would be worth the extra boilerplate, especially considering how it also lets us avoid having yet another odd smart pointer with subpar ergonomics on top of removing this footgun.

Jan 24 '23 18:01 TheRawMeatball

This could be avoided by replacing the get method with a scope method that takes a impl FnOnce(&mut T) similar to how the existing ParallelCommands impl works. I personally think this would be worth the extra boilerplate, especially considering how it also lets us avoid having yet another odd smart pointer with subpar ergonomics on top of removing this footgun.

This makes it difficult to work with several of these together, with a triple nested scope in that case. We can have both, but only having the scope is probably not 100% viable.

Jan 24 '23 18:01 james7132

Fair enough, didn't consider such a case.

Jan 24 '23 18:01 TheRawMeatball

Moving to 0.12 as there are still open conversations.

Jul 07 '23 23:07 cart

main thing for me to approve is just to see a bench that check_visibility still performs the same.

Comparing many_cubes on main vs this PR, it does seem like we're not losing anything. There shouldn't be a performance gain of any kind from this PR, so I'm chalking up the difference to a change in execution environment between the runs. Either way, the difference isn't too big to drive suspicion here IMO. (yellow is this PR, red is main)