bevy icon indicating copy to clipboard operation
bevy copied to clipboard

Pack multiple vertex and index arrays together into growable buffers.

Open pcwalton opened this issue 1 year ago • 1 comments

This commit uses the [offset-allocator] crate to combine vertex and index arrays from different meshes into single buffers. Since the primary source of wgpu overhead is from validation and synchronization when switching buffers, this significantly improves Bevy's rendering performance on many scenes.

This patch is a more flexible version of #13218, which also used slabs. Unlike #13218, which used slabs of a fixed size, this commit implements slabs that start small and can grow. In addition to reducing memory usage, supporting slab growth reduces the number of vertex and index buffer switches that need to happen during rendering, leading to improved performance. To prevent pathological fragmentation behavior, slabs are capped to a maximum size, and mesh arrays that are too large get their own dedicated slabs.

As an additional improvement over #13218, this commit allows the application to customize all allocator heuristics. The MeshAllocatorSettings resource contains values that adjust the minimum and maximum slab sizes, the cutoff point at which meshes get their own dedicated slabs, and the rate at which slabs grow. Hopefully-sensible defaults have been chosen for each value.

Unfortunately, WebGL 2 doesn't support the base vertex feature, which is necessary to pack vertex arrays from different meshes into the same buffer. wgpu represents this restriction as the downlevel flag BASE_VERTEX. This patch detects that bit and ensures that all vertex buffers get dedicated slabs on that platform. Even on WebGL 2, though, we can combine all index arrays into single buffers to reduce buffer changes, and we do so.

The following measurements are on Bistro:

Overall frame time improves from 8.74 ms to 5.53 ms (1.58x speedup): Screenshot 2024-07-09 163521

Render system time improves from 6.57 ms to 3.54 ms (1.86x speedup): Screenshot 2024-07-09 163559

Opaque pass time improves from 4.64 ms to 2.33 ms (1.99x speedup): Screenshot 2024-07-09 163536

Migration Guide

Changed

  • Vertex and index buffers for meshes may now be packed alongside other buffers. If your app uses custom drawing logic, you should now query the MeshAllocator to find the location of the mesh data.

pcwalton avatar Jul 09 '24 23:07 pcwalton

The original PR was marked for release notes, so I'm going to add it here too!

BD103 avatar Jul 11 '24 22:07 BD103

I've addressed all the review comments as much as I feel is appropriate. From running the examples on WebGL the performance seems fine. I can't do very in-depth performance analysis there without better diagnostics, and I don't think we should block this PR on better WebGL diagnostics.

pcwalton avatar Jul 14 '24 22:07 pcwalton

The issue with the bistro interior seems to have been resolved. I can replicate the opaque pass improvements. FPS did not improve in tests of low-end hardware (I am totally gpu bottlenecked on most scenes), but it also doesn't regress which is the main thing I was concerned about.

NthTensor avatar Jul 16 '24 00:07 NthTensor

The merge fallout is fixed. This is ready to be merged again.

pcwalton avatar Jul 16 '24 20:07 pcwalton

Thank you to everyone involved with the authoring or reviewing of this PR! This work is relatively important and needs release notes! Head over to https://github.com/bevyengine/bevy-website/issues/1662 if you'd like to help out.

alice-i-cecile avatar Oct 20 '24 14:10 alice-i-cecile