Record render diagnostics for all engine passes
Objective
- Fixes #16742
Solution
- Adds GPU spans for all the engine's render and compute passes. Related passes have been grouped into a single span (following the same convention as bloom, which was already implemented).
Testing
- Tested all the various rendering features on Windows 11 + Vulkan and also some on DirectX.
Showcase
A tracy graph with lots of rendering features enabled:
Follow up
- Nesting spans - there are 2 problems I encountered that stopped me from implementing this
- Spans have to be declared in the same scope so it's not clear how to group passes across nodes in the render graph. Possibly a skill issue on my part :)
- Nested spans appear incorrectly in tracy (all on the same y axis). I don't know if this is a bug in tracy, a bug in the tracy_client crate or something to do with the multithreading workaround
end_zone()hack but I couldn't really get it to work. I left a nested span in meshlet_visibility_buffer_raster for each shadow view so you can see what it looks like by trying the meshlet example (let me know if I should remove it).
- Improve developer usability - after writing the same code over and over I started to feel like bevy should have its own create render/compute pass functions so it can instrument them automatically. It would also make it less likely for future PRs to forget to add diagnostics.
Additional notes
There were talks on render crate refactors happening - I think the changes are small enough that rebases won't be difficult but also feel free to delay merging until later if it'll get in the way.
Spans have to be declared in the same scope so it's not clear how to group passes across nodes in the render graph.
Not possible, yeah. I wouldn't worry about trying to get cross-node spans working. Keeping spans confined to a single node is a very reasonable design.
Nested spans appear incorrectly in tracy (all on the same y axis). I don't know if this is a bug in tracy, a bug in the tracy_client crate or something to do with the multithreading workaround end_zone() hack but I couldn't really get it to work.
Is this for GPU spans, or CPU spans?
Is this for GPU spans, or CPU spans?
Just GPU spans appear incorrect. I have seen other (non rust) projects where nested GPU spans do work though.
- Imo, please remove the _pass suffixes from the time span labels
I don’t feel strongly about this so I will do that. The only reason I named them that in the first place was to share the exact same name as the label in the render pass descriptor (which could probably do with naming scheme consistency pr of their own).
@jf908 I'd like to get this in for 0.17 but I think @JMS55 is waiting on you to make a couple changes.
Been a bit busy but I can get back to this now!
Just want to confirm with @JMS55 before I do the renaming:
- If we're going to rename all the labels to remove the _pass suffix then I guess I may as well rename all the render pass descriptor debug labels to match so I'll do that too
- You suggested
meshlet_material_opaque_3d_passwhich has the _pass suffix, was that an accident or should I make an exception for things likemain_opaque_3d_pass, etc?
For stuff like ssao, bloom, etc, I wouldn't add _pass to the name.
For the main opaque 3d pass, meshlet material passes, transparent main 3d pass, etc, I would include _pass.
Should be ready for review now