bevy
bevy copied to clipboard
Meshlet rendering (initial feature)
Objective
- Implements a more efficient, GPU-driven (https://github.com/bevyengine/bevy/issues/1342) rendering pipeline based on meshlets.
- Meshes are split into small clusters of triangles called meshlets, each of which acts as a mini index buffer into the larger mesh data. Meshlets can be compressed, streamed, culled, and batched much more efficiently than monolithic meshes.
Awesome!
Two-pass occlusion culling should not be tied to meshlets specifically. We should just do that separately.
I don't see how they could.
The meshlet system is already setup to do occlusion culling, basically. I already upload the bounding sphere information per meshlet, and have a culling and indirect draw pipeline in place.
I don't see how we could share occlusion culling passes with meshlet meshes and regular meshes without basically duplicating all of the meshlet system and indirect draw setup, bounding info, etc, but changing it to operate on whole meshes instead of meshlets which is just more inefficient. If we were going that route, there would be no point in doing all the work for meshlets in the first place.
Gate meshoptimizer dependency behind a cargo feature, or rewrite it in Rust.
If you are looking for a pure-rust implementation, have a look at https://github.com/yzsolt/meshopt-rs
Unfortunately meshopt-rs does not support the meshlet APIs (it's ported from an older version of meshoptimizer), which is why I'm using a different crate.
Plan for materials (assuming opaque meshes only, will extend to other passes later):
- In extract_meshlet_meshes, create a Vec of all meshlet mesh entities
- In a new Queue system, as part of MaterialPlugin for each material, for each entity from step one check if it's using the material via RenderMaterialInstances, and if so, push the material index (gotten from the hashmap in step 3) to a new Vec in MeshletGpuScene
- In a new PrepareAssets system that runs after prepare_materials::<M>, as part of MaterialPlugin for each material, upload changed opaque materials to MeshletGpuScene 3a. Queue new render pipeline for the material, store pipeline ID + material bind group in a Vec 3b. Use a HashMap to store index of material in the Vec, hashing based on material AssetId 3c. Could skip materials not used by any meshlet mesh, at the cost of extra queries
- Optionally (may improve GPU coherence, at the cost of CPU time) in prepare_meshlet_per_frame_resources, sort per-frame-buffers by the material index before uploading
- In prepare_meshlet_per_frame_resources, instead of a single DrawIndexedIndirect item, make draw_command_buffer hold M items equal to the total material count 5a. Each DrawIndexedIndirect::base_index needs to be equal to the total count of possible indices (after instancing) for that material 5b. We can skip draws for materials that aren't used by any meshlet mesh entities
- In cull_meshlets, for each meshlet, index into the draw command array based on the material index to get the DrawIndexedIndirect for the meshlet's material, and then write to location draw_index_buffer_start + draw_indirect_command.base_index + offset in the index buffer
This PR is usable atm, and has large chunks of code ready. The plan is once 0.12 is released, I'll start opening smaller PRs with parts of these changes. The goal is to incrementally merge chunks of code for meshlet rendering, instead of one big PR with all the changes.
Plan to support (occlusion culling, visibility buffer, shadows, forward + prepass, deferred):
- Each view will get a visibility buffer (not to be confused with the vbuffer texture)
- Stores visible/not visible in view as a boolean for each meshlet, persisted across two frames (previous and current)
- Instances of meshlets (thread_meshlet) can get their previous visibility via
previous_visibility[previous_visibility_id[thread_id]]
, where previous_visibility_id is uploaded on the CPU during the current frame, using a resource holding the data from the previous frame (might be a better way to do this without the extra indirection and previous_visibility_id buffer...)
- MeshletVBufferNode - Render depth, vbuffer (thread_id + triangle_id), and material depth
- PreviousOccluderPreparePass - Take meshlets visible last frame, as indicated by previous_visibility, write index buffer to render them
- VBufferRenderPass1 - Render the 3 node outputs using a single draw_indirect_indexed()
- GenerateHzbPass - Take depth buffer generated so far, downscale several times to create a hierarchical depth buffer
- CullPass - Take all meshlets, frustum cull, occlusion cull against hzb, write visibility to visibility buffer for next frame, and if visible write index buffer to render them
- VBufferRenderPass2 - Render the 3 node outputs using a single draw_indirect_indexed()
- MeshletShadowMapNode/MeshletPrepassNode/MeshletOpaque3dMainPassNode
- For each material, draw a single triangle/quad using the material depth trick, reconstruct vertex properties, and then shade the fragment
The messiest part is having to use a previous/next visibility buffer with separate indices, instead of being able to assume [object/instance/entity] index is constant across frames, and use a single read_write visibility buffer. Will have to think more on how to do this.
Additional complication: Vertex data can't be provided to the fragment shader via the vertex output. Fragment shader will need to read the vbuffer pixel, and load all the meshlet and then vertex data. This means we need to modify the fragment shader.
I'll probably have to re-write shaders using naga_oil somehow to load the vbuffer data to construct the VertexOutput, instead of directly reading it as the fragment input.
Useful reference for visbuffer barycentrics, partial derivatives, and the other complicated stuff: https://github.com/JuanDiegoMontoya/Frogfood/blob/main/data/shaders/visbuffer/VisbufferResolve.frag.glsl
Next step is to emit "material id" to a depth texture. Visbuffer fragment shader will output material ID to a R16Uint color attachment, and then an extra fullscreen triangle render pass will read that texture and write to a depth target.
You added a new feature but didn't add a description for it. Please update the root Cargo.toml file.
You added a new feature but didn't add a description for it. Please update the root Cargo.toml file.
You added a new feature but didn't add a description for it. Please update the root Cargo.toml file.
You added a new feature but didn't add a description for it. Please update the root Cargo.toml file.
You added a new feature but didn't add a description for it. Please update the root Cargo.toml file.
You added a new feature but didn't update the readme. Please run cargo run -p build-templated-pages -- update features
to update it, and commit the file change.
Followup tasks: https://github.com/bevyengine/bevy/issues/11518
Sadly moving to 0.14 :(. I'd really like to avoid merging any other rendering PRs until this one is merged once 0.13 releases though, as it's going to be a huge pain to rebase. @IceSentry @robtfm.
Unfortunately meshopt-rs does not support the meshlet APIs (it's ported from an older version of meshoptimizer), which is why I'm using a different crate.
Hey, meshopt-rs
author here! The upgrade to 0.16 is WIP for a long time now. I'm not using the project directly hence my contributions stalled in the last year. If there's a possibility that it'll be used in Bevy, I'll try to allocate some time for contributions again!
What's the minimum meshoptimizer
API version you can work with? I remember 0.15 (which my crate already supports) contains meshlet processing logic, although 0.16 did some big breaking changes, changing the Meshlet
structure itself too.
Hey @yzsolt. That comment is outdated. There's since been a more active fork of your crate, which is what I've been using: https://github.com/gwihlidal/meshopt-rs, under the name meshopt
. I appreciate the offer, but it's no longer necessary :).
Hey @yzsolt. That comment is outdated. There's since been a more active fork of your crate, which is what I've been using: https://github.com/gwihlidal/meshopt-rs, under the name
meshopt
. I appreciate the offer, but it's no longer necessary :).
Buuut that's not a fork, instead a wrapper around the original C library, isn't it? Mine is a pure Rust reimplementation which could bring some benefits in the long run, like full safety, maybe better (non-SIMD) performance, but most importantly a more straightforward build system integration.
Nevertheless, I understand that the C wrapper currently satisfies your needs. I'll still try to work on updating my (re)implementation, so it can become a viable alternative for Bevy in the future.
Oh right, sorry, it's been a while since I looked at things. Yes, you're right, the crate I'm using is a C-wrapper and not pure Rust. Apologies for the confusion.
In that case, it would be nice to have a pure Rust version, but the existing C bindings we're using are working pretty well so far. I'm not too pressed about a pure Rust version personally, as I'm going to be adding METIS soon which will be bindings to a C library as well. Up to other Bevy maintainers to decide on, and how much effort you want to commit to maintaining your library.
Meshoptimizer 0.16 would be the minimum required API I need for now, but the newer the better as they're working on some new simplification APIs I'm going to need in the future.
I definitely have a preference for pure Rust: it's always nice to reduce the complexity in our build tree and it makes it a lot easier to make changes upstream.
I won't block meshlets on it or anything, but I would push a PR to swap the dependency.
I prefer pure Rust, mostly for build system reasons. Rust's build story (esp. to Wasm) is so much less painful.
I think its important to keep in mind that meshopt is usually not used at runtime, its only depended on for asset preprocessing. I highly doubt anyone's gonna be running meshopt in wasm. It usually takes several minutes to process a single mesh.
https://github.com/bevyengine/bevy/pull/11904 has changed the core shader code, going to need to rebase all the shader work on top of that :/
@JMS55 I wrote that PR; let me know if I can help with the rebase :)
@JMS55 if you want, you can disregard all my changes and I'll re-add them as a PR on your PR. That way your rebase becomes trivial.
PR is fully rebased
The generated examples/README.md
is out of sync with the example metadata in Cargo.toml
or the example readme template. Please run cargo run -p build-templated-pages -- update examples
to update it, and commit the file change.