godot icon indicating copy to clipboard operation
godot copied to clipboard

Add shader baker to project exporter.

Open DarioSamo opened this issue 10 months ago • 10 comments

Overview

Based mostly on the work done by @RandomShaper, this PR adds a new Editor Export Plugin that will scan resources and scenes for shaders and pre-compile them on the right format used by the driver in the target platform.

Shaders on SPIR-V, DXIL and MIL formats are interchangeable between systems and can be shared to end-users to skip long startup times resulting from having to compile them on the target platform. While pipeline compilation is still unavoidable and a requirement, Godot is currently and unnecessarily doing work on the end user's system that can be done ahead of time in the Editor and shipped as part of the final project.

This PR required a large amount of work on refactoring the Shader classes and decoupling the Shader Compilers from the Rendering Device Drivers we currently have. A new generic Shader Container class has been introduced and allows for heavy customization of the exported shader if required by the target platform. A significant amount of work has gone into also taking out any platform-specific definitions that were being added to shaders that may differ in the end user's system, and in the cases this is unavoidable for optimization reasons, shader variants have been created instead.

When using this PR, Shader Baking is an optional step that will increase the export time of a project with the major benefit that the end user who plays the game will be able to skip shader compilation entirely.

image

Another important change is the ability for the Shader class to use a multi-level shader cache: one that it reads and writes from as regular and one it can use as the fallback and is read-only. This one is filled in with the directory from the exported project's embedded shader cache in the .pck.

This feature is intended to be finished for Godot 4.5 if possible.

Results

The results speak for themselves when dealing with backends that have very long shader conversion times. While in Vulkan the improvement is there but not as noticeable on a system with many threads, the difference is astounding when dealing with a backend like D3D12, which has very long conversion times due to the NIR transpilation process.

Even on a system with 32 threads, a D3D12 project goes from taking over a minute to load to just ~2 seconds.

TPS demo using D3D12 backend without and with using the shader baker functionality.

master

https://github.com/user-attachments/assets/00f9a39c-7101-41c6-b5b1-139b9df165ea

shader-baker

https://github.com/user-attachments/assets/956be00e-f5ae-488d-8ff4-354c46e7068d

The results are reproducible but not as drastic on Vulkan, although you'll gain the biggest benefit out of this feature the less CPU threads you have at your disposal.

Notice that for testing this effectively, you must delete the shader_cache present in the user directory for the project you're testing, as between runs, Godot will cache compiled shader binaries in this directory. On Windows, this directory can be found in %AppData%/Godot/app_userdata/<Project Name>/shader_cache.

TODO

  • [ ] Metal support (@stuartcarnie has shown interest in tackling this).
  • [x] Verify how this can interact with imported GLSL files.
  • [x] Find and account for more edge cases the shader baker is not catching currently by testing on a wider variety of projects.
  • [x] Account for the cases where the renderer must be set to the matching renderer of the exported platform for embedded shaders to be baked. Warn appropriately on the editor.
  • [x] Verify there's no regression in BaseMaterial3D being updated automatically in the viewport from a user editing it.
  • [x] Find the remaining global shader defines that might be around the codebase from querying the current rendering device's capabilities.

Bugsquad edit: Should fix: https://github.com/godotengine/godot/issues/94734


Contributed by W4 Games. 🍀

DarioSamo avatar Feb 07 '25 18:02 DarioSamo

This is great!

Will you merge to the main branch, and then I can follow up with a PR to implement Metal support?

If the user is on Windows or macOS, we can utilise the Metal compiler toolchain to generate Metal libraries, reducing load times even more, as that compiles the Metal source into a platform-independent, intermediate format. I notice that Unreal Engine has an option to do this.

stuartcarnie avatar Feb 07 '25 19:02 stuartcarnie

Will you merge to the main branch, and then I can follow up with a PR to implement Metal support?

I think it's pretty far from being merged to main at the moment due to 4.4 going into RC soon, I think it'd be best to just PR to this branch as I don't think it'll take too long to adapt what we have to it.

If the user is on Windows or macOS, we can utilise the Metal compiler toolchain to generate Metal libraries, reducing load times even more, as that compiles the Metal source into a platform-independent, intermediate format. I notice that Unreal Engine has an option to do this.

Yes, this would be great. There's a scheme for adding "Platforms" and you can definitely do a Windows-specific version that loads the toolchain if you're under Windows to produce the MIL instead.

Under the new Shader Container design, you won't need to handle anything about serialization of the Shader reflection. All you need is to just convert to the shader binary and you can insert whatever extra bytes you wish to serialize that the platform might need.

DarioSamo avatar Feb 07 '25 19:02 DarioSamo

Will this feature bake shaders for all backends by default? If yes, can users filter out certain backends out of the export process? Say if developers decide to support Vulkan only on a platform that supports Vulkan and Dx12.

warriormaster12 avatar Feb 09 '25 15:02 warriormaster12

Will this feature bake shaders for all backends by default? If yes, can users filter out certain backends out of the export process? Say if developers decide to support Vulkan only on a platform that supports Vulkan and Dx12.

It bakes the shaders for the driver selected for the platform. It doesn't cover the case at the moment of the user offering options for multiple backends.

DarioSamo avatar Feb 10 '25 13:02 DarioSamo

One concern I have is for users exporting to Windows from Linux (which is a common scenario on CI). While it should be possible to export SPIR-V already for projects using Vulkan, exporting DXIL for Direct3D doesn't sound feasible right now. None of the D3D12 code is compiled in the Linux editor which is used for exporting on CI. This also applies to users exporting for macOS from other platforms.

Of course, you can sidestep this by using a Windows CI runner, but these are generally slower to perform a full CI run due to slower I/O (and may have higher demand too, leading to increased queues).

More generally, I don't know if this shader compilation process will work in headless anyway (since no GPU is initialized, and none is available on GitHub Actions unless you pay for it).

I suppose we'd need a way to build the NIR stuff regardless of whether Direct3D 12 is enabled in the current build, as long as it's an editor build.

Calinou avatar Feb 10 '25 14:02 Calinou

One concern I have is for users exporting to Windows from Linux (which is a common scenario on CI). While it should be possible to export SPIR-V already for projects using Vulkan, exporting DXIL for Direct3D doesn't sound feasible right now. None of the D3D12 code is compiled in the Linux editor which is used for exporting on CI. This also applies to users exporting for macOS from other platforms.

The only D3D12 code that is required at the moment is root signature serialization to a binary blob. If that can be worked around (CC @RandomShaper), then D3D12 is not a requirement for building D3D12 shaders.

More generally, I don't know if this shader compilation process will work in headless anyway (since no GPU is initialized, and none is available on GitHub Actions unless you pay for it).

The shader classes aren't tied to a particular driver running. No GPU is required for the process, as that was part of most of the refactoring that was done to take it out of the drivers and into their own classes that can be used independently.

DarioSamo avatar Feb 10 '25 14:02 DarioSamo

@Calinou Just brought this PR to my attention! I am super excited to test this out! Please feel free to @ me when this is ready to be tested :)

TCROC avatar Feb 14 '25 00:02 TCROC

Would it be possible to schedule this to 4.5? What would be required to do so?

kisg avatar Feb 21 '25 11:02 kisg

Would it be possible to schedule this to 4.5? What would be required to do so?

Metal's the only component missing as far as I can tell. I can get around to it by the time we enter 4.5 but I'd like to give Stuart time to see if he can manage it as he's more familiar with the driver than I am.

DarioSamo avatar Feb 21 '25 14:02 DarioSamo

@DarioSamo do you think you could merge to the main branch once 4.4 is release, so I can work from my fork with my build configuration? I will be able to implement it fairly easily from there.

stuartcarnie avatar Feb 22 '25 19:02 stuartcarnie

@DarioSamo do you think you could merge to the main branch once 4.4 is release, so I can work from my fork with my build configuration? I will be able to implement it fairly easily from there.

I'm not sure it's possible as I can't figure out a way that isn't very cumbersome to have the current scheme and the new scheme working in tandem without, in the process, just adapting the Metal backend to use the new shader container format and basically ending up with a working shader baker most of the way there already.

DarioSamo avatar Feb 24 '25 13:02 DarioSamo

@stuartcarnie @DarioSamo

I'm actually confused by both the question and the answer.

do you think you could merge to the main branch once 4.4 is release, so I can work from my fork with my build configuration? I will be able to implement it fairly easily from there.

Aren't all PRs merged into the main branch?

I'm not sure it's possible as I can't figure out a way that isn't very cumbersome to have the current scheme and the new scheme working in tandem without, in the process, just adapting the Metal backend to use the new shader container format and basically ending up with a working shader baker most of the way there already.

Same reason for confusion. Why wouldn't it be possible? Aren't all PRs merged into the main branch? In fact, isn't this PR explicitly requesting to merge into master?

Thank you! :)

TCROC avatar Feb 24 '25 14:02 TCROC

Oh I think I see what you are asking now. This branch has merge conflicts. Are you asking if these can be resolved?

TCROC avatar Feb 24 '25 14:02 TCROC

@TCROC It's not the merge conflicts, it's the fact that Metal does not build at the moment on this PR. It can't be merged as it breaks the platform. I don't have an easy way to not break it as the changes are fundamental to how the shader methods work.

The amount of work to make it build as a bandaid fix would be roughly equivalent to the amount of work to implement the shader container in Metal that is necessary for shader baking to work.

DarioSamo avatar Feb 24 '25 14:02 DarioSamo

Ah I see. Thank you for the explanation! :)

TCROC avatar Feb 24 '25 15:02 TCROC

👋🏻 @kisg

Overview: Metal

Currently, we use SPIRV-Cross to generate Metal Shader Language (MSL) from the SPIR-V and serialise this source to the binary data. We want to be able to support using the offline Metal compiler toolchain so that we can generate a .metallib file, when the toolchain is available. It isn't required, but will further reduce startup time, as devices such as iOS won't have to execute the Metal Compiler background task to compile the MSL first.

Solution Sketch: Metal

To support MSL and .metallib, we should extend ShaderBinaryData:

https://github.com/godotengine/godot/blob/5312811c4da268892087a88d2b5cdc716f2c219e/drivers/metal/rendering_device_driver_metal.mm#L1557

and a library_type field, that is an enumeration:

enum LibraryType {
  METAL_SHADER_LANGUAGE,
  METAL_LIBRARY,
}

[!NOTE]

Adding a field will require the version is updated:

https://github.com/godotengine/godot/blob/5312811c4da268892087a88d2b5cdc716f2c219e/drivers/metal/rendering_device_driver_metal.mm#L1076

The remainder of the work is just implementing the container, as @DarioSamo has done for Vulkan and D3D12. Don't worry about implementing offline compilation for your initial PR

Offline compilation

Offline compilation takes the MSL and create a .metallib. See this page for more information.

Future work will add support to spawn the Metal compiler toolchain, which is available for macOS and Window platforms, and generate .metallib files. We can serialise these instead of the raw MSL. Instead of creating a MTLLibrary from source:

https://github.com/godotengine/godot/blob/9fc39ae321ffd8feb7032f090f63e232006a55f6/drivers/metal/metal_objects.mm#L2028-L2044

which results in background compilation, we can use the newLibraryWithData:error: API to load a compiled Metal library.

stuartcarnie avatar Feb 28 '25 19:02 stuartcarnie

@DarioSamo when we're baking shaders, do you think it might be possible to provide the parameters required to generate a pipeline state descriptor?

@kisg I suggest you watch this Apple developer video, as it is possible we could provide a 3rd level of compilation, to completely remove runtime compilation. We would need the pipeline descriptor state to achieve this deeper level of customisation, but that would have to come from Godot so we could generate the appropriate JSON descriptor.

stuartcarnie avatar Mar 01 '25 20:03 stuartcarnie

FYI:

We have a working Metal implementation of the Shader Baker. It supports both runtime (where we bake the MSL source code) and offline Metal compilation. The offline compilation generates the platform independent bytecode (AIR) format.

In our test application the MSL baking did not make much difference, but with the AIR baking the first startup time went from ~ 7+ seconds to ~2 - 2.5 seconds. The same app with Vulkan (with Shader Baker enabled, so SPIR-V baked in the app) + MoltenVK starts in ~5.1 seconds.

We have to clean it up a bit (currently it only supports iOS targets, no MacOS), but we hope to publish it soon as a PR for this PR.

kisg avatar Mar 14 '25 17:03 kisg

Brilliant! Great work @kisg! I look forward to testing it out! :)

TCROC avatar Mar 14 '25 17:03 TCROC

@kisg Awesome to hear! I'll be glad to review and merge it once it's done!

DarioSamo avatar Mar 14 '25 17:03 DarioSamo

@kisg awesome to hear – I'll be happy to help review it when ready too!

stuartcarnie avatar Mar 14 '25 19:03 stuartcarnie

In our test application the MSL baking did not make much difference, but with the AIR baking the first startup time went from ~ 7+ seconds to ~2 - 2.5 seconds. The same app with Vulkan (with Shader Baker enabled, so SPIR-V baked in the app) + MoltenVK starts in ~5.1 seconds.

@kisg Great stuff! Were you using the LAZY shader initialisation when testing the baking to MSL?

stuartcarnie avatar Mar 14 '25 19:03 stuartcarnie

In our test application the MSL baking did not make much difference, but with the AIR baking the first startup time went from ~ 7+ seconds to ~2 - 2.5 seconds. The same app with Vulkan (with Shader Baker enabled, so SPIR-V baked in the app) + MoltenVK starts in ~5.1 seconds.

@kisg Great stuff! Were you using the LAZY shader initialisation when testing the baking to MSL?

Yes, we have it hardcoded to LAZY for the MSL based cache now. :)

kisg avatar Mar 14 '25 19:03 kisg

That is going to be a very nice win!

Another feature we might be able to use in the future is MTLBinaryArchive to save compiled pipelines for future use. That is an area I'm going to explore more in the future. I am not sure yet if it will eliminate the requirement to compile the MTLLibrary, as MTLBinaryArchive is specified when creating the Metal pipeline.

stuartcarnie avatar Mar 14 '25 19:03 stuartcarnie

FYI:

We have a working Metal implementation of the Shader Baker. It supports both runtime (where we bake the MSL source code) and offline Metal compilation. The offline compilation generates the platform independent bytecode (AIR) format.

In our test application the MSL baking did not make much difference, but with the AIR baking the first startup time went from ~ 7+ seconds to ~2 - 2.5 seconds. The same app with Vulkan (with Shader Baker enabled, so SPIR-V baked in the app) + MoltenVK starts in ~5.1 seconds.

We have to clean it up a bit (currently it only supports iOS targets, no MacOS), but we hope to publish it soon as a PR for this PR.

Hi @kisg - what's the status on your Metal implementation? We'd like to get the shader baker merge sooner than later in the dev branch for 4.5, so we don't risk missing the merge window, and get enough testing before the stable release.

akien-mga avatar Mar 31 '25 13:03 akien-mga

Hi @kisg - what's the status on your Metal implementation? We'd like to get the shader baker merge sooner than later in the dev branch for 4.5, so we don't risk missing the merge window, and get enough testing before the stable release.

We will provide a PR for this PR this week; sorry for the delay.

kisg avatar Apr 01 '25 14:04 kisg

@DarioSamo Would it be possible to rebase this PR on the current master before I create the Metal PR?

kisg avatar Apr 01 '25 14:04 kisg

@DarioSamo Would it be possible to rebase this PR on the current master before I create the Metal PR?

It might take some work but I'll see to get it done this week if possible.

DarioSamo avatar Apr 01 '25 16:04 DarioSamo

@kisg Rebased on top of the latest master.

DarioSamo avatar Apr 03 '25 14:04 DarioSamo

@kisg Now that it is rebased, what is your timeline for making a PR?

clayjohn avatar Apr 07 '25 15:04 clayjohn