three.js icon indicating copy to clipboard operation
three.js copied to clipboard

Implement KHR_parallel_shader_compile support

Open fernandojsg opened this issue 6 years ago • 23 comments

This new KHR_parallel_shader_compile extension will bring shader compilation off to the main thread and even if the total compiling time could remain the same it won't be blocking the main thread which is a huge benefit.

It will require some modifications on the way three.js is handling the compilation and linking of shaders and move it from sync to async. Some ideas have been proposed on the Khronos mailing list and the Khronos repo PR

Just opening the issue so we could discuss ideas on how to address this change.

fernandojsg avatar Apr 23 '19 17:04 fernandojsg

Some initial thoughts from my understanding after reading the thread and the spec:

Currently without parallel compiling, the pipeline looks more or less like this: image using the extension, even on the worst case with just one thread, it will looks the same and it will take as much time as without the extension, but it won't block the main thread, so we will still have a huge benefit using it.

With the latest debug (false by default) to the status check on the shader compilation we avoid unnecessary blockings WebGLShader.js#L23-L40, but with the extension enabled we will still stall on linkProgram WebGLProgram.js#L586-L605 In this case, without modifying anything and enabling the extension we could get benefits of parallel compiling the fs and vs shaders, while linkProgram will wait for them to complete, looking similar to this (assuming 2 threads): image

In order to get the best of it we should group the compilation and linking, as @jdashg proposed on https://github.com/KhronosGroup/WebGL/pull/2855#issuecomment-483486677:

for (const x of shaders)
   gl.compileShader(x);
for (const x of programs)
   gl.linkProgram(x);

Using this way we could get a timeline similar to: image

Grouping the shaders compilation together could be easy enough to achieve as we could just compile them as they go, while grouping the linkProgram will need substantial modifications on the code, to queue them and call them after all the compile shader calls have been executed. It could be done using also the approach proposed by Jeff:

function* linkingProgress(programs) {
   const ext = gl.getExtension('KHR_parallel_shader_compile');
   let todo = programs.slice();
   while (todo.length) {
      if (ext) {
         todo = todo.filter(x => !gl.getProgramParameter(x, ext.COMPLETION_STATUS_KHR));
      } else {
         const x = todo.pop();
         gl.getProgramParameter(x, gl.LINK_STATUS);
      }
      yield 1.0 - (todo.length / programs.length);
   }
}

Introducing this asynchrony we need to define a isReady() function to determine if the compilation and linking has been finished or not, similar to how babylon is doing currently.

We should use that isReady() function on the WebGLRenderer's AnimationLoop so we could avoid executing it until isReady() returns true. Although it could be interesting to add that check on WebGLRenderer::render(). At the same time it should be handy to provide a callback so users could be notified when this happens in case they are not using setAnimationLoop and rely on a requestAnimationFrame/animate() loop instead.

fernandojsg avatar Apr 24 '19 12:04 fernandojsg

I've prototyped splitting the compilation phase and link phase of programs to see if you can get much benefit from using the fact the Chrome compiles shaders asynchronously in a separate GPU thread rather than the main JS thread. not very useful, but the same mechanism could be used with parallel compile.

There are less mods than you might expect, primarily initMaterial() needs splitting into two halves since the first half setups up the environment for creating a new program if required and the second half which relies on interrogating the program state so in effect serialises with the compilation and linking.

WebGLProgram again needs two parts and an isReady status that would check compilation completion. This avoids the need for global lists of shaders and programs in various states and maintains the information in WebGLProgram() objects where is belongs.

One interesting question then with a parallel compilation mechanism:

You don't have an atomic change of scene after adding several new materials. Some meshes or parts of meshes with multiple materials, would display initially in different frames when intended to display at the same time which could be ugly especially for very slow compiles. How do you handle this? Preserving the atomic nature could be difficult.

aardgoose avatar Apr 24 '19 13:04 aardgoose

@aardgoose cool! do you have any PR open already for that? It could be interesting to see that approach.

Regarding the atomic change, I agree that things always get weird with async in place. We will need to add some validation before using a material that is not fully ready yet, and probably we could include some helpers for the case when you want to load multiple materials like an explosion effect and you want to render it just when all of them are ready. Maybe an attribute to the object that you are going to render, or a three-state value on ready: ready, not ready, ready but waiting for a friend program to finish :).

fernandojsg avatar Apr 24 '19 13:04 fernandojsg

Personally started trial implementation to see the behavior. If anyone is interested in

Branch https://github.com/takahirox/three.js/tree/ParallelShaderExtension

Diffs https://github.com/mrdoob/three.js/compare/dev...takahirox:ParallelShaderExtension?expand=1

I need to clean up, optimize, and still take care some stuffs but it seems working on Canary.

takahirox avatar Apr 24 '19 21:04 takahirox

FYI, Chrome Canary which is the only browser currently supporting KHR_parallel_shader_compile extension might have the extension performance issue now. gl.getProgramParameter(program, ext.COMPLETION_STATUS_KHR) is slow because it seems to wait for compile/link completion although it shouldn't wait for.

https://www.khronos.org/webgl/public-mailing-list/public_webgl/1904/msg00042.php https://bugs.chromium.org/p/chromium/issues/detail?id=957001

takahirox avatar Apr 26 '19 12:04 takahirox

FYI, COMPLETION_STATUS_KHR performance issue seems to have been resolved on Canary https://www.khronos.org/webgl/public-mailing-list/public_webgl/1905/msg00007.php

takahirox avatar May 11 '19 06:05 takahirox

Still WIP but I want to share so far that I locally confirmed KHR_parallel_shader_compile with some optimizations improves the frame rate dropping on application start up.

https://twitter.com/superhoge/status/1128438246638333953

takahirox avatar May 15 '19 15:05 takahirox

FWIW I have a prototype patch for Firefox adding support for KHR_parallel_shader_compile which appears to work quite well, it certainly reduces jank although not quite as smooth as Chrome canary., with the basic Khronos demo/test:

https://www.khronos.org/registry/webgl/sdk/tests/performance/parallel_shader_compile/

The main overhead now on the main thread with FF appears to be the shader translation/validation before submitting to the GPU driver.

Performance trace alternating serial and parallel passes of the above test:

Annotation 2019-05-19 190807

aardgoose avatar May 19 '19 18:05 aardgoose

Very excited to see progress on this! Looking forward to this feature

PWhiddy avatar May 21 '19 23:05 PWhiddy

An alternative take.

The Khronos WebGL group recommend not checking shader compilation status, but just doing:

  • compile shaders
  • link program
  • check link error and only check shader errors if link fails

for all cases, not just using KHR_parallel_shader_compile.

The link operation serialises automatically on the compiles in the background. So at least with Chrome you get the benefits of any built in parallel/asynchronous operation that already exists. Firefox actually checks retrieves the completion status and logs as part of the compile and link operations and caches for later use at the moment, so no real benefit there.

So I have removed error checking and display from WebGLShaders.js and reworked it in WebGLProgram.js, this could be submitted upstream without changing functionality.

https://github.com/aardgoose/three.js/tree/parallel1

Building on this to enable KHR_parallel_shader_compile, but now only needings two states rather than three.

https://github.com/aardgoose/three.js/tree/parallel2

Still some minor issues, probably something that should be enabled on a renderer or material basis as required rather than by default. Tested with Chrome canary and my hacked version of Firefox.

@takahirox

aardgoose avatar May 25 '19 16:05 aardgoose

My site freezes on load for a second or two. I can delay compilation, but the freeze is unavoidable, and the shader in question isn't even that many lines of code (500?). Anyway would love to see this in THREE.js.

drone1 avatar Sep 02 '19 18:09 drone1

Been looking into this recently. Put up a pair of PRs (Primarily #19752, which depends on #19745) that offer one way of taking advantage of parallel shader compilation, though to get the most benefit out of it apps would need to make a fairly minor change to their loading code. If you've been following this issue feel free to let me know how/if that approach works for you!

toji avatar Jun 29 '20 04:06 toji

Thank you so much for doing this work! I've been dying for this feature for ages.

drone1 avatar Jun 29 '20 15:06 drone1

I'd like to understand a bit more about the current status of KHR_parallel_shader_compile across browsers:

Just like 2.5 years ago, Chrome and Firefox do not support this extension. Only Safari supports it (on OS X and iOS alike, which is nice, therefore in effect for all iPhone and iPad browsers).

Also, several folks elsewhere claimed that not much work is done by compileShader itself, pointing to linkProgram incurring all the work. Did anyone measure significant parallelism/speedup with the KHR_parallel_shader_compile extension on Safari, where the extension is available?

Is it still good advice to compile all shaders in bulk, and then link all programs in bulk, irrespective of the extension?

PS. With or without the extension, Safari seems to be pathologically slow, tested with simple/short es300 shaders. Each program, consisting of 1-2 dozen vertex shader lines and 1 dozen fragment shader lines took 9ms. On Chrome and Firefox, it took about 1/100th of that (10 shaders took about 1ms).

monfera avatar Jan 03 '22 20:01 monfera

Just like 2.5 years ago, Chrome and Firefox do not support this extension. Only Safari supports it (on OS X and iOS alike, which is nice, therefore in effect for all iPhone and iPad browsers).

Chrome supports KHR_parallel_shader_compile on Windows, though I'm not seeing it on Mac. Presumably that's due to the extension being implementable on ANGLE, while a native OpenGL backend would have to support the extension directly and Apple's drivers likely do not. (Apple's implementation in Metal-based, IIRC, so they could implement the extension themselves.)

I don't see support in Firefox on any platform.

As for performance gains, I have some screenshots in https://github.com/mrdoob/three.js/pull/19752#issuecomment-995305936 that show the type of difference we're potentially talking about. It's not an absolute speedup in this case, but the ability to defer blocking calls until they can spend the minimal amount of time blocking. (Meaning less jank on the main JS thread.) Also, despite the method name the extension doesn't just cover compileShader but the entire program creation, including linking.

toji avatar Jan 04 '22 18:01 toji

For three-gpu-pathtracer this would be really nice. The path tracing shader is very complex and is taking upwards of 12 seconds to compile on my desktop machine (2070 super, i7 7700k), which seems to affect other tabs and other system responsiveness. There are some other things I think that can be done to improve that time such as removing shader code not needed for the materials being rendered but either way I think an async compile would be a significant improvement. Then it would be able to be more easily compiled alongside other tasks such as file loading, processing, and BVH generation.

It looks like this will work everywhere important, now, outside of FF:

https://developer.mozilla.org/en-US/docs/Web/API/KHR_parallel_shader_compile

gkjohnson avatar Aug 14 '22 18:08 gkjohnson

I have an old patch for FF to implement parallel shader compilation that passes the Khronos WebGL test suite. I expect it would need updating.

https://phabricator.services.mozilla.com/D83315

aardgoose avatar Aug 14 '22 19:08 aardgoose

@gkjohnson

What sort of API would you find useful?

aardgoose avatar Aug 23 '22 16:08 aardgoose

What sort of API would you find useful?

I think something like renderer.compileAsync( scene, camera ) similar to the compile function would be good. It might be nice to have some variant that just takes a material, too, if possible. My specific use case will involve compiling a material that's used with a FullScreenQuad so it won't necessarily be directly attached to a mesh or scene to be compiled but that's easy enough to work around.

gkjohnson avatar Aug 23 '22 18:08 gkjohnson

I'll knock something up along those lines.

aardgoose avatar Aug 23 '22 21:08 aardgoose

Worth nothing that compileAsync is what I introduced in https://github.com/mrdoob/three.js/pull/19752#issuecomment-995305936, but it's been difficult to get traction for it or it's dependency for whatever reason.

I'm happy to rebase those and ensure they still work if there's a realistic chance of them being merged.

toji avatar Aug 23 '22 23:08 toji

@toji Yes I remember that PR now - I would like to see that merged. Unfortunately this touches a sensitive part of the project code that I'm unfamiliar with so I don't feel confident providing feedback.

gkjohnson avatar Aug 24 '22 04:08 gkjohnson

Sorry to poke here, but any suggestions on next step(s) to get this blessed, majestic chunk of code merged? It would be a shame to waste what seems to some really high-value work. If this works well it may provide a much smoother user experience for many applications. Thank you.

drone1 avatar Sep 10 '22 13:09 drone1