Contiguous access
Objective
Enables accessing slices from tables directly via Queries.
Fixes: #21861
Solution
One new trait:
ContiguousQueryDataallows to fetch all values from tables all at once (an implementation for&Treturns a slice of components in the set table, for&mut Treturns a mutable slice of components in the set table as well as a struct with methods to set update ticks (to match thefetchimplementation))
A method as_contiguous_iter in QueryIter making possible to iterate using these traits.
Macro QueryData was updated to support contiguous items when contiguous(target) attribute is added (a target can be all, mutable and immutable, refer to the custom_query_param example)
Testing
sparse_set_contiguous_querytest verifies that you can't usenext_contiguouswith sparse set componentstest_contiguous_query_datatest verifies that returned values are validbase_contiguousbenchmark (file is namediter_simple_contiguous.rs)base_no_detectionbenchmark (file is namediter_simple_no_detection.rs)base_no_detection_contiguousbenchmark (file is namediter_simple_no_detection_contiguous.rs)base_contiguous_avx2benchmark (file is namediter_simple_contiguous_avx2.rs)
Showcase
Examples contiguous_query, custom_query_param
Example
let mut world = World::new();
let mut query = world.query::<(&Velocity, &mut Position)>();
let mut iter = query.iter_mut(&mut world);
// velocity's type is &[Velocity]
// position's type is &mut [Position]
// ticks's type is ContiguousComponentTicks
for (velocity, (position, mut ticks)) in iter.as_contiguous_iter().unwrap() {
for (v, p) in velocity.iter().zip(position.iter_mut()) {
p.0 += v.0;
}
// sets ticks
ticks.mark_all_as_updated();
}
Benchmarks
Code for base benchmark:
#[derive(Component, Copy, Clone)]
struct Transform(Mat4);
#[derive(Component, Copy, Clone)]
struct Position(Vec3);
#[derive(Component, Copy, Clone)]
struct Rotation(Vec3);
#[derive(Component, Copy, Clone)]
struct Velocity(Vec3);
pub struct Benchmark<'w>(World, QueryState<(&'w Velocity, &'w mut Position)>);
impl<'w> Benchmark<'w> {
pub fn new() -> Self {
let mut world = World::new();
world.spawn_batch(core::iter::repeat_n(
(
Transform(Mat4::from_scale(Vec3::ONE)),
Position(Vec3::X),
Rotation(Vec3::X),
Velocity(Vec3::X),
),
10_000,
));
let query = world.query::<(&Velocity, &mut Position)>();
Self(world, query)
}
#[inline(never)]
pub fn run(&mut self) {
for (velocity, mut position) in self.1.iter_mut(&mut self.0) {
position.0 += velocity.0;
}
}
}
Iterating over 10000 entities from one table and increasing a 3-dimensional vector from component Position by a 3-dimensional vector from component Velocity
| Name | Time | Time (AVX2) | Description |
|---|---|---|---|
| base | 5.5828 µs | 5.5122 µs | Iteration over components |
| base_contiguous | 4.8825 µs | 1.8665 µs | Iteration over contiguous chunks |
| base_contiguous_avx2 | 2.0740 µs | 1.8665 µs | Iteration over contiguous chunks with enforced avx2 optimizations |
| base_no_detection | 4.8065 µs | 4.7723 µs | Iteration over components while bypassing change detection through bypass_change_detection() method |
| base_no_detection_contiguous | 4.3979 µs | 1.5797 µs | Iteration over components without registering update ticks |
Using contiguous 'iterator' makes the program a little bit faster and it can be further vectorized to make it even faster
Things to think about
- The neediness of
offsetparameter inContiguousQueryData
- How does this pr compare to https://github.com/bevyengine/bevy/pull/6161?
- Am I right in my understanding that some things might not properly vectorize due to alignment issues even if they use
as_contiguous_iter? - If a user wanted to work with the standard libraries simd https://doc.rust-lang.org/std/simd/index.html. Ignoring alignment issues, would this pr work with that?
- How does this pr compare to Implement batched query support #6161?
This pr just enables slices from tables to be returned directly when applicable, it doesn't implement any batches and it doesn't ensure any specific (other than rust's) alignment (yet these slices may be used to apply simd).
- Am I right in my understanding that some things might not properly vectorize due to alignment issues even if they use
as_contiguous_iter?
This pr doesn't deal with any alignments but (as of my understanding) you can always take sub-slices which would meet your alignment requirements. And just referring to the issue #21861, even without any specific alignment the code gets vectorized.
- If a user wanted to work with the standard libraries simd https://doc.rust-lang.org/std/simd/index.html. Ignoring alignment issues, would this pr work with that?
No, the returned slices do not have any specific (other than rust's) alignment requirements.
The solution looks promising to solve issue #21861.
If you want to use SIMD instructions explicitly, alignment is something you usually have to manage yourself (with an aligned allocator or a peeled prologue). Auto-vectorization won’t “update” the alignment for you – it just uses whatever alignment it can prove and otherwise emits unaligned loads. From that perspective, a contiguous slice is already sufficient; fully aligned SIMD is a separate concern on top of that.
You added a new example but didn't add metadata for it. Please update the root Cargo.toml file.
It looks like your PR has been selected for a highlight in the next release blog post, but you didn't provide a release note.
Please review the instructions for writing release notes, then expand or revise the content in the release notes directory to showcase your changes.