BatchedMesh Example much slower on WebGPU than WebGL on Android
Description
On Android ( Samsung Galaxy S20 FE ) BatchedMesh Example WebGPU is much slower :
WebGPU : ~13FPS
WebGL : ~25FPS
Reproduction steps
- load on Android https://threejs.org/examples/?q=bat#webgpu_mesh_batch
- enabled/disable WebGPU
Code
Live example
``
Screenshots
No response
Version
r169
Device
Mobile
Browser
Chrome
OS
Android
The multiDrawAPI isn't currently supported in WebGPU, which is why a single multi-draw call with 20,000 batched elements performs significantly better in WebGL, especially on smartphones.
However, there’s good news! A new MultiDrawIndirect API is on the horizon for WebGPU, which is expected to surpass the performance of the WebGL version: https://github.com/gpuweb/gpuweb/issues/1354#issuecomment-2370162949 https://issues.chromium.org/issues/369246557/dependencies
This API is already available in Chrome Canary behind the chromium-experimental-multi-draw-indirect flag, enabled through enable-unsafe-webgpu. I plan to begin working with it over the coming weeks, as multi-draw is an important part of my workflow.
In the meantime, as discussed in this PR, we can implement a workaround using multiple drawIndirect() calls with a single indirect buffer, mapped at different offsets for each draw alongside Render Bundles. This approach can mimic the upcoming MultiDrawIndirect API until it becomes widely available: https://github.com/mrdoob/three.js/pull/29197#issuecomment-2324472275
For now, I’ll wait for @Spiri0's work on implementing drawIndirect that looks very promising, which will provide a solid base for that work: https://github.com/mrdoob/three.js/issues/29568#issuecomment-2396426753
@mwyrzykowski Just a heads-up: there’s currently an issue in the official Three.js BatchedMesh WebGPU example where setting the count above 1024 causes a break in the WebGPU backend of Safari. I tested this on the latest Safari Technology Preview. https://threejs.org/examples/?q=batch#webgpu_mesh_batch
The error:
[Log] GPUDeviceLostInfo {reason: "unknown", message: ""}
@mwyrzykowski Just a heads-up: there’s currently an issue in the official Three.js BatchedMesh WebGPU example where setting the count above 1024 causes a break in the WebGPU backend of Safari. I tested this on the latest Safari Technology Preview. https://threejs.org/examples/?q=batch#webgpu_mesh_batch
The error:
[Log] GPUDeviceLostInfo {reason: "unknown", message: ""}
Oh thank you for the report @RenaudRohlinger. Do you know which Mac you tried? I tried an M2 Mac Studio with STP 207 with 17788 instances:
might very well be Mac related.
In any case, the performance is really bad, so at the very least I will investigate that until I can figure out how to reproduce.
I've been working with the drawIndirect since we got it in r170. This works quite well but it will be more comfortable to use it with structs
const drawBufferStruct = struct({
vertexCount: 'uint',
instanceCount: 'uint',
firstVertex: 'uint',
firstInstance: 'uint',
});
The values can then be accessed more clearly in Fn and wgslFn
drawBuffer.vertexCount = vertexCount;
drawBuffer.instanceCount = instanceCount;
instead of:
drawBuffer.x = vertexCount;
drawBuffer.y = instanceCount;
like now
This means that uniforms can be bundled efficently by userside to handle them easier in shaders. Especially if you want to bundle a lot of different parameters from each instance in one or few buffer arrays. I already have it working, but now I have to implement it more cleanly. Let's see if I can make it to r171. My job is currently taking a bit more of my time, but I'm just as motivated to round out the drawIndirect topic with structs, so that it can be used to its full potential.
https://threejs.org/examples/?q=batch#webgpu_mesh_batch
On my M3 Pro Max I dont crash at 20k instance on safari but im at 1fps.. when 120fps on chrome on the same machine @RenaudRohlinger @mwyrzykowski
Looks promising @Spiri0, sorry for hijacking this issue by the way. 😬
@mwyrzykowski Thanks for looking into it! I'm using a Macbook Pro M1 Max from 2021 with Safari 207 and Sequoia 15.1.
Awesome @mwyrzykowski! Performance remained stable during profiling with an instance count of 512, but when I slightly increase it—say, around 600—I occasionally encounter Unhandled Promise Rejection: RangeError: Range consisting of offset and length are out of bounds in Safari, often right before a crash.
@RenaudRohlinger I have a codePen here on how to use the drawIndirect buffer in conjunction with compute shaders. However, in accordance with If you feel like it, you can convert the shaders to TSL and turn it into an example because it also shows how to use drawIndirect with storage buffers, which will actually always be the case just like using it with compute shaders. If you don't feel like it, no problem then I will do it after the struct expansion. https://codepen.io/Spiri0/pen/PoMBvzz
With a few more buffers you can control exactly which instances should be visible and which should not, but that would be the topic for another example with structs
P.S. sorry for hijacking this issue too 😅 But this issue already touches the drawIndirect topic so much that this can soon be made more efficient.
@RenaudRohlinger I have a question about tsl / Fn and you know it better than me. So far I've only used wgslFn. You also have a forum account right? That would be more appropriate to discuss than using the issue for secondary topics.
@Spiri0 Sure! https://discourse.threejs.org/u/yakuno 😊