Pack multiple vertex and index arrays together into growable buffers.
This commit uses the [offset-allocator] crate to combine vertex and index arrays from different meshes into single buffers. Since the primary source of wgpu overhead is from validation and synchronization when switching buffers, this significantly improves Bevy's rendering performance on many scenes.
This patch is a more flexible version of #13218, which also used slabs. Unlike #13218, which used slabs of a fixed size, this commit implements slabs that start small and can grow. In addition to reducing memory usage, supporting slab growth reduces the number of vertex and index buffer switches that need to happen during rendering, leading to improved performance. To prevent pathological fragmentation behavior, slabs are capped to a maximum size, and mesh arrays that are too large get their own dedicated slabs.
As an additional improvement over #13218, this commit allows the application to customize all allocator heuristics. The MeshAllocatorSettings resource contains values that adjust the minimum and maximum slab sizes, the cutoff point at which meshes get their own dedicated slabs, and the rate at which slabs grow. Hopefully-sensible defaults have been chosen for each value.
Unfortunately, WebGL 2 doesn't support the base vertex feature, which is necessary to pack vertex arrays from different meshes into the same buffer. wgpu represents this restriction as the downlevel flag BASE_VERTEX. This patch detects that bit and ensures that all vertex buffers get dedicated slabs on that platform. Even on WebGL 2, though, we can combine all index arrays into single buffers to reduce buffer changes, and we do so.
The following measurements are on Bistro:
Overall frame time improves from 8.74 ms to 5.53 ms (1.58x speedup):
Render system time improves from 6.57 ms to 3.54 ms (1.86x speedup):
Opaque pass time improves from 4.64 ms to 2.33 ms (1.99x speedup):
Migration Guide
Changed
- Vertex and index buffers for meshes may now be packed alongside other buffers. If your app uses custom drawing logic, you should now query the
MeshAllocatorto find the location of the mesh data.
The original PR was marked for release notes, so I'm going to add it here too!
I've addressed all the review comments as much as I feel is appropriate. From running the examples on WebGL the performance seems fine. I can't do very in-depth performance analysis there without better diagnostics, and I don't think we should block this PR on better WebGL diagnostics.
The issue with the bistro interior seems to have been resolved. I can replicate the opaque pass improvements. FPS did not improve in tests of low-end hardware (I am totally gpu bottlenecked on most scenes), but it also doesn't regress which is the main thing I was concerned about.
The merge fallout is fixed. This is ready to be merged again.
Thank you to everyone involved with the authoring or reviewing of this PR! This work is relatively important and needs release notes! Head over to https://github.com/bevyengine/bevy-website/issues/1662 if you'd like to help out.