wgpu
wgpu copied to clipboard
[Metal] Render/Compute pass forces command buffer finish at the end
Description During my investigation of case #5721, I noticed that wgpu produces a lot of command buffers. Each render or compute pass forces a command buffer to finish. After further investigation, I found that this happens because of the resource state tracking code, which injects additional command buffers with the correct resource state changes. This is only relevant in Vulkan, Dx12, and maybe some other APIs that require explicit resource state tracking; it is not required for Metal. As a result, Metal simply produces a lot of empty command buffers and unnecessary splitting.
According to the Metal documentation, it is recommended to keep to a single command buffer for single-threaded apps.
For multi-threaded apps, it should be kept per subtask. However, as it currently splits per pass, it makes it quite impossible to apply these recommendations.
This functionality also highly obscures GPU workload debugging with Xcode GPU capture, as it produces a high amount of command buffers and push/pop debug groups only function within a command buffer. This is how it looks now:
After removing this command buffer splitting:
As for the performance, I did not notice a significant difference on the GPU side as it really depends on the use case. However, for the CPU, I did notice that consolidating into a single command buffer saves encoding time.
Multiple command buffers:
Single command buffer:
In any case, the benefit of having multiple versus a single command buffer on Metal highly depends on the use case, adding more reason to give control to the developer instead of forcing splitting.
Repro I made a small patch that disables command buffer splitting, so you could test the difference. This patch is for prototype purposes only as it works only on Metal and would break Vulkan/Dx12. A proper fix would need some conditional execution. do-not-split-command-buffer.patch