bevy
bevy copied to clipboard
Multithreaded render command encoding
Objective
- Encoding many GPU commands (such as in a renderpass with many draws, such as the main opaque pass) onto a
wgpu::CommandEncoderis very expensive, and takes a long time. - To improve performance, we want to perform the command encoding for these heavy passes in parallel.
Solution
RenderContextcan now queue up "command buffer generation tasks" which are closures that will generate a command buffer when called.- When finalizing the render context to produce the final list of command buffers, these tasks are run in parallel on the
ComputeTaskPoolto produce their corresponding command buffers. - The general idea is that the node graph will run in serial, but in a node, instead of doing rendering work, you can add tasks to do render work in parallel with other node's tasks that get ran at the end of the graph execution.
Nodes Parallelized
MainOpaquePass3dNodePrepassNodeDeferredGBufferPrepassNodeShadowPassNode(One task per view)
Future Work
- Extend this to UI, 2d, transparent, and transmissive nodes?
- Needs testing - small command buffers are inefficient - it may be worth reverting to the serial command encoder usage for render phases with few items.
- All "serial" (traditional) rendering work must finish before parallel rendering tasks (the new stuff) can start to run.
- There is still only one submission to the graphics queue at the end of the graph execution. There is still no ability to submit work earlier.
Performance Improvement (OLD)
Thanks to @Elabajaba for testing on Bistro for me due to my current lack of a desktop.
TLDR: Without shadow mapping, this PR has no impact. With shadow mapping, this PR gives ~twice the FPS of main, which is ~equal to the FPS without shadow mapping!
Changelog
MainOpaquePass3dNode,PrepassNode,DeferredGBufferPrepassNode, and each shadow map withinShadowPassNodeare now encoded in parallel, giving greatly increased CPU performance, mainly when shadow mapping is enabled.- Added
RenderContext::add_command_buffer_generation_task(). - Some render graph and Node related types and methods now have additional lifetime constraints.
Base implementation looks good to me - interested in seeing what benchmarks will show.
Not sure if I should mark this as a breaking change or not. Technically there's some extra lifetime constraints now, but 99.999% of users won't need to change their code. I think the only thing you would need to change is if you're using bevy_render's RenderContext for your own stuff, independent from the RenderGraph.
Note that this PR currently depends on https://github.com/gfx-rs/wgpu/pull/3626
Marking this as blocked on a potential wgpu release, since this otherwise would add complexity without any gains.
Does the benchmark use arcanized wgpu for both single and multi threaded impls? Seems like arcanized wgpu on its own might give us a perf boost.
Iirc @Elabajaba tried singlethreaded + arcanized and it made ~no difference vs singlethreaded non-arcanized. The bechmarks above were non-arcanized.
We've been wanting to use bevy for a fairly graphically intensive game, so any performance improvements we can get would be extremely useful for us! I'd love to see these changes pushed
This is blocked on https://github.com/gfx-rs/wgpu/pull/3626 upstream, so we need to wait for that to be merged then for the next wgpu release (which should be about late October).
Example alien_cake_addict failed to run, please try running it locally and check the result.
Some clarifications about the benchmarks based on discussion from when this PR was initially made:
- Single-threaded, with shadows, without wgpu arcanisation, and with it were something like 50 vs 54 fps average respectively.
- The image in the original post contains results all with wgpu arcanisation.
- All results so far are based on a much older version of the arcanisation branch.
Looks fine on 7e837f5bf4e83e2875e012b0dd02051334445992
2024-01-28T09:06:43.311046Z INFO bevy_render::renderer: AdapterInfo { name: "Intel(R) Graphics (ADL GT2)", vendor: 32902, device: 17958, device_type: IntegratedGpu, driver: "Intel open-source Mesa driver", driver_info: "Mesa 23.0.4-0ubuntu1~23.04.1", backend: Vulkan }
2024-01-28T09:06:43.532619Z INFO bevy_diagnostic::system_information_diagnostics_plugin::internal: SystemInfo { os: "Linux 23.04 Ubuntu", kernel: "6.2.0-39-generic", cpu: "", core_count: "14", memory: "31.0 GiB" }
Looks fine to me on https://github.com/bevyengine/bevy/commit/7e837f5bf4e83e2875e012b0dd02051334445992
Windows
AdapterInfo {
name: "NVIDIA GeForce RTX 2080 Ti",
vendor: 4318,
device: 7687,
device_type: DiscreteGpu,
driver: "NVIDIA",
driver_info: "536.23",
backend: Vulkan
}
SystemInfo {
os: "Windows 11 Pro",
kernel: "22621",
cpu: "",
core_count: "16",
memory: "63.1 GiB"
}
Mac
AdapterInfo {
name: "Apple M1 Max",
vendor: 0,
device: 0,
device_type: IntegratedGpu,
driver: "",
driver_info: "",
backend: Metal
}
SystemInfo {
os: "MacOS 14.2.1 ",
kernel: "23.2.0",
cpu: "",
core_count: "10",
memory: "64.0 GiB"
}
Broken on Windows+AMD for me (shadows flicker):
https://github.com/bevyengine/bevy/assets/177631/c6db1344-acd5-4092-9416-c2e263e06639
AdapterInfo {
name: "AMD Radeon RX 6800 XT",
vendor: 4098,
device: 29631,
device_type: DiscreteGpu,
driver: "AMD proprietary driver",
driver_info: "24.1.1 (AMD proprietary shader compiler)",
backend: Vulkan
}
It looks like your PR is a breaking change, but you didn't provide a migration guide.
Could you add some context on what users should update when this change get released in a new version of Bevy?
It will be used to help writing the migration guide for the version. Putting it after a ## Migration Guide will help it get automatically picked up by our tooling.
The changes to the APIs are technically breaking changes. Could you document them in a Migration Guide section?
It's not a breaking change I don't think, as the lifetimes should be inferred for most use cases.
It looks like your PR is a breaking change, but you didn't provide a migration guide.
Could you add some context on what users should update when this change get released in a new version of Bevy?
It will be used to help writing the migration guide for the version. Putting it after a ## Migration Guide will help it get automatically picked up by our tooling.
It's not a breaking change I don't think, as the lifetimes should be inferred for most use cases.
This adds an argument to RenderGraphRunner::run. I don't expect many people to be running their own render graph, but it's worth documenting.
This adds an argument to RenderGraphRunner::run. I don't expect many people to be running their own render graph, but it's worth documenting.
No changes were made to the argument count, just adding some lifetimes that should be inferred for all existing use cases.
Does this still need addressing?
Github won't let me respond to it, but idk what this comment is in reference to.
Ship it!