three.js icon indicating copy to clipboard operation
three.js copied to clipboard

RFC: WebGPURenderer prototype single uniform buffer update / pass

Open aardgoose opened this issue 8 months ago ā€¢ 6 comments

Prototype mechanism to reduce number of writeBuffer() calls using a single large buffer for all object uniforms groups, which is updated before the renderPass is submitted. As used in some other engines with WebGPU.

All examples run correctly with this PR. Effects greatest with large numbers of objects being rendered. The largest changes are the GPU thread times which are greatly reduced when testing with the webgpu_sprites examples. From 5ms/frame with per object buffer to 2.5ms with single buffer in my brief testing.

No attempt has been made:

  • to synchronize the buffer updating and reading
  • allow buffer resizing or detect buffer overflow
  • recovery of buffer space on object deletion.

aardgoose avatar Dec 16 '23 21:12 aardgoose

Reduce the number of calls from writeBuffer() is too part of https://github.com/mrdoob/three.js/pull/27134 After configuring, many things called will be reduced, instead of being updated per object, they will be updated per frame.

I like your idea, but I wonder if it wouldn't be better to have this configured in Node and adjusted at setup()?

sunag avatar Dec 18 '23 17:12 sunag

Hi @aardgoose

Do you think about fixing the conflicts? I was thinking about merge this PR soon

sunag avatar May 13 '24 16:05 sunag

I'll take a look tomorrow.

aardgoose avatar May 13 '24 20:05 aardgoose

Awesome! Is it ready for review @aardgoose? Can you promote it from Draft to PR maybe? šŸ˜Š

RenaudRohlinger avatar May 19 '24 09:05 RenaudRohlinger

@RenaudRohlinger will do.

We might want to select specific uniform groups to be managed in this way, which is now possible as the buffer is passed through the NodeBuilder.

An obvious next stage is to look at reclaiming unused buffers, but we need a deallocation mechanism first, when a material is disposed of.

aardgoose avatar May 19 '24 10:05 aardgoose

Added lists per extent size (multiple of block size) for freed buffers when objects are removed from the scene graph.. These lists are used for new allocations in preference to free space at the end of the buffer.

Block size is typically 256 bytes (https://web3dsurvey.com/webgpu/limits/minStorageBufferOffsetAlignment).

Added reworked example with continuous removal and addition of new objects and stats demonstrating buffer use. This only uses blocks of 256B or less.

aardgoose avatar Jun 12 '24 21:06 aardgoose

šŸ“¦ Bundle size

Full ESM build, minified and gzipped.

Filesize dev Filesize PR Diff
685.1 kB (169.6 kB) 685.1 kB (169.6 kB) +0 B

šŸŒ³ Bundle size after tree-shaking

Minimal build including a renderer, camera, empty scene, and dependencies.

Filesize dev Filesize PR Diff
462 kB (111.4 kB) 462 kB (111.4 kB) +0 B

github-actions[bot] avatar Jul 14 '24 10:07 github-actions[bot]

I've been conducting performance benchmarks and believe this PR could significantly enhance the webgpu_performances.html example, particularly within the WebGL backend. It could potentially boost performance from around 30fps to over 120fps.

Due to the force-push, I'm unable to check out the PR myself. If possible, could you give it a try?

Additionally, to address the performance issues in webgpu_performances.html, Iā€™m considering using gl.bindBufferRange and gl.bufferSubData instead of gl.bufferData( gl.UNIFORM_BUFFER, data, gl.DYNAMIC_DRAW ), it will not solve anything but simply improve overall the UBOs strategy in the WebGL Backend.

While I'm still investigating the exact cause of the performance drop in WebGL, I'm fairly confident this PR addresses a major bottleneck. The issue seems to stem from overwhelming the GPU with hundreds of buffer uploads, or at least CPU-GPU data transfer, which then causes a drop in the subsequent 5-6 frames every 6 frames in the RAF. Although this PR is more of a great feature that will work as a workaround, it should help significantly. In the long term, implementing a caching system in the UBO logic to prevent unnecessary uploads with more precise range might be the real solution to the WebGLBackend performance issues.

/cc @sunag @Mugen87

RenaudRohlinger avatar Aug 14 '24 00:08 RenaudRohlinger

We need to check if the WebGLBackend still has redundant calls. Last time I looked at this, the WebGLRenderer had more state comparators, so it only sends the commands that have actually changed to the WebGL.

I haven't had time to implement UniformGroup on all nodes yet. If we don't do this, we won't be able to achieve optimal performance because the model's matrix groups will be confused with those of the material, causing unnecessary overhead for both backends. I think after this we will be able to implement buffer sharing more safely.

sunag avatar Aug 14 '24 01:08 sunag