Objective

Encoding many GPU commands (such as in a renderpass with many draws, such as the main opaque pass) onto a wgpu::CommandEncoder is very expensive, and takes a long time.
To improve performance, we want to perform the command encoding for these heavy passes in parallel.

Solution

RenderContext can now queue up "command buffer generation tasks" which are closures that will generate a command buffer when called.
When finalizing the render context to produce the final list of command buffers, these tasks are run in parallel on the ComputeTaskPool to produce their corresponding command buffers.
The general idea is that the node graph will run in serial, but in a node, instead of doing rendering work, you can add tasks to do render work in parallel with other node's tasks that get ran at the end of the graph execution.

Nodes Parallelized

MainOpaquePass3dNode
PrepassNode
DeferredGBufferPrepassNode
ShadowPassNode (One task per view)

Future Work

Extend this to UI, 2d, transparent, and transmissive nodes?
Needs testing - small command buffers are inefficient - it may be worth reverting to the serial command encoder usage for render phases with few items.
All "serial" (traditional) rendering work must finish before parallel rendering tasks (the new stuff) can start to run.
There is still only one submission to the graphics queue at the end of the graph execution. There is still no ability to submit work earlier.

Performance Improvement (OLD)

Thanks to @Elabajaba for testing on Bistro for me due to my current lack of a desktop.

TLDR: Without shadow mapping, this PR has no impact. With shadow mapping, this PR gives ~twice the FPS of main, which is ~equal to the FPS without shadow mapping!

Changelog

MainOpaquePass3dNode, PrepassNode, DeferredGBufferPrepassNode, and each shadow map within ShadowPassNode are now encoded in parallel, giving greatly increased CPU performance, mainly when shadow mapping is enabled.
Added RenderContext::add_command_buffer_generation_task().
Some render graph and Node related types and methods now have additional lifetime constraints.

Jul 16 '23 03:07 JMS55

Base implementation looks good to me - interested in seeing what benchmarks will show.

Jul 16 '23 13:07 dmyyy

Not sure if I should mark this as a breaking change or not. Technically there's some extra lifetime constraints now, but 99.999% of users won't need to change their code. I think the only thing you would need to change is if you're using bevy_render's RenderContext for your own stuff, independent from the RenderGraph.

Jul 16 '23 19:07 JMS55

Note that this PR currently depends on https://github.com/gfx-rs/wgpu/pull/3626

Jul 16 '23 20:07 JMS55

Marking this as blocked on a potential wgpu release, since this otherwise would add complexity without any gains.

Jul 17 '23 03:07 james7132

Does the benchmark use arcanized wgpu for both single and multi threaded impls? Seems like arcanized wgpu on its own might give us a perf boost.

Jul 20 '23 22:07 cart

Iirc @Elabajaba tried singlethreaded + arcanized and it made ~no difference vs singlethreaded non-arcanized. The bechmarks above were non-arcanized.

Jul 21 '23 00:07 JMS55

We've been wanting to use bevy for a fairly graphically intensive game, so any performance improvements we can get would be extremely useful for us! I'd love to see these changes pushed

Aug 09 '23 18:08 TheRedXD

This is blocked on https://github.com/gfx-rs/wgpu/pull/3626 upstream, so we need to wait for that to be merged then for the next wgpu release (which should be about late October).

Aug 09 '23 18:08 Elabajaba

Example alien_cake_addict failed to run, please try running it locally and check the result.

Aug 16 '23 03:08 github-actions[bot]

Some clarifications about the benchmarks based on discussion from when this PR was initially made:

Single-threaded, with shadows, without wgpu arcanisation, and with it were something like 50 vs 54 fps average respectively.
The image in the original post contains results all with wgpu arcanisation.
All results so far are based on a much older version of the arcanisation branch.

Nov 24 '23 17:11 superdump

Looks fine on 7e837f5bf4e83e2875e012b0dd02051334445992 2024-01-28T09:06:43.311046Z INFO bevy_render::renderer: AdapterInfo { name: "Intel(R) Graphics (ADL GT2)", vendor: 32902, device: 17958, device_type: IntegratedGpu, driver: "Intel open-source Mesa driver", driver_info: "Mesa 23.0.4-0ubuntu1~23.04.1", backend: Vulkan } 2024-01-28T09:06:43.532619Z INFO bevy_diagnostic::system_information_diagnostics_plugin::internal: SystemInfo { os: "Linux 23.04 Ubuntu", kernel: "6.2.0-39-generic", cpu: "", core_count: "14", memory: "31.0 GiB" }

Jan 28 '24 09:01 arcashka

Looks fine to me on https://github.com/bevyengine/bevy/commit/7e837f5bf4e83e2875e012b0dd02051334445992

Windows

AdapterInfo {
  name: "NVIDIA GeForce RTX 2080 Ti",
  vendor: 4318,
  device: 7687,
  device_type: DiscreteGpu,
  driver: "NVIDIA",
  driver_info: "536.23",
  backend: Vulkan
}

SystemInfo {
  os: "Windows 11 Pro",
  kernel: "22621",
  cpu: "",
  core_count: "16",
  memory: "63.1 GiB"
}

Mac

AdapterInfo {
  name: "Apple M1 Max",
  vendor: 0,
  device: 0,
  device_type: IntegratedGpu,
  driver: "",
  driver_info: "",
  backend: Metal
}

SystemInfo {
  os: "MacOS 14.2.1 ",
  kernel: "23.2.0",
  cpu: "",
  core_count: "10",
  memory: "64.0 GiB"
}

Jan 29 '24 01:01 ChristopherBiscardi

Broken on Windows+AMD for me (shadows flicker):

https://github.com/bevyengine/bevy/assets/177631/c6db1344-acd5-4092-9416-c2e263e06639

AdapterInfo {
    name: "AMD Radeon RX 6800 XT",
    vendor: 4098,
    device: 29631,
    device_type: DiscreteGpu,
    driver: "AMD proprietary driver",
    driver_info: "24.1.1 (AMD proprietary shader compiler)",
    backend: Vulkan
}

Jan 29 '24 05:01 Elabajaba

It looks like your PR is a breaking change, but you didn't provide a migration guide.

Could you add some context on what users should update when this change get released in a new version of Bevy? It will be used to help writing the migration guide for the version. Putting it after a ## Migration Guide will help it get automatically picked up by our tooling.

Feb 07 '24 20:02 github-actions[bot]

The changes to the APIs are technically breaking changes. Could you document them in a Migration Guide section?

Feb 07 '24 20:02 james7132

It's not a breaking change I don't think, as the lifetimes should be inferred for most use cases.

Feb 07 '24 21:02 JMS55

It looks like your PR is a breaking change, but you didn't provide a migration guide.

Could you add some context on what users should update when this change get released in a new version of Bevy? It will be used to help writing the migration guide for the version. Putting it after a ## Migration Guide will help it get automatically picked up by our tooling.

Feb 08 '24 07:02 github-actions[bot]

It's not a breaking change I don't think, as the lifetimes should be inferred for most use cases.

This adds an argument to RenderGraphRunner::run. I don't expect many people to be running their own render graph, but it's worth documenting.

Feb 08 '24 07:02 james7132

This adds an argument to RenderGraphRunner::run. I don't expect many people to be running their own render graph, but it's worth documenting.

No changes were made to the argument count, just adding some lifetimes that should be inferred for all existing use cases.

Feb 09 '24 07:02 JMS55

Does this still need addressing?

Github won't let me respond to it, but idk what this comment is in reference to.

Feb 09 '24 07:02 JMS55

Ship it!

Feb 09 '24 07:02 james7132

bevy
bevy copied to clipboard

Multithreaded render command encoding

Objective

Solution

Nodes Parallelized

Future Work

Performance Improvement (OLD)

Changelog

Windows

Mac

bevy bevy copied to clipboard

Multithreaded render command encoding

Objective

Solution

Nodes Parallelized

Future Work

Performance Improvement (OLD)

Changelog

Windows

Mac

bevy
bevy copied to clipboard