Add shader baker to project exporter.
Overview
Based mostly on the work done by @RandomShaper, this PR adds a new Editor Export Plugin that will scan resources and scenes for shaders and pre-compile them on the right format used by the driver in the target platform.
Shaders on SPIR-V, DXIL and MIL formats are interchangeable between systems and can be shared to end-users to skip long startup times resulting from having to compile them on the target platform. While pipeline compilation is still unavoidable and a requirement, Godot is currently and unnecessarily doing work on the end user's system that can be done ahead of time in the Editor and shipped as part of the final project.
This PR required a large amount of work on refactoring the Shader classes and decoupling the Shader Compilers from the Rendering Device Drivers we currently have. A new generic Shader Container class has been introduced and allows for heavy customization of the exported shader if required by the target platform. A significant amount of work has gone into also taking out any platform-specific definitions that were being added to shaders that may differ in the end user's system, and in the cases this is unavoidable for optimization reasons, shader variants have been created instead.
When using this PR, Shader Baking is an optional step that will increase the export time of a project with the major benefit that the end user who plays the game will be able to skip shader compilation entirely.
Another important change is the ability for the Shader class to use a multi-level shader cache: one that it reads and writes from as regular and one it can use as the fallback and is read-only. This one is filled in with the directory from the exported project's embedded shader cache in the .pck.
This feature is intended to be finished for Godot 4.5 if possible.
Results
The results speak for themselves when dealing with backends that have very long shader conversion times. While in Vulkan the improvement is there but not as noticeable on a system with many threads, the difference is astounding when dealing with a backend like D3D12, which has very long conversion times due to the NIR transpilation process.
Even on a system with 32 threads, a D3D12 project goes from taking over a minute to load to just ~2 seconds.
TPS demo using D3D12 backend without and with using the shader baker functionality.
master
https://github.com/user-attachments/assets/00f9a39c-7101-41c6-b5b1-139b9df165ea
shader-baker
https://github.com/user-attachments/assets/956be00e-f5ae-488d-8ff4-354c46e7068d
The results are reproducible but not as drastic on Vulkan, although you'll gain the biggest benefit out of this feature the less CPU threads you have at your disposal.
Notice that for testing this effectively, you must delete the shader_cache present in the user directory for the project you're testing, as between runs, Godot will cache compiled shader binaries in this directory. On Windows, this directory can be found in %AppData%/Godot/app_userdata/<Project Name>/shader_cache.
TODO
- [ ] Metal support (@stuartcarnie has shown interest in tackling this).
- [x] Verify how this can interact with imported GLSL files.
- [x] Find and account for more edge cases the shader baker is not catching currently by testing on a wider variety of projects.
- [x] Account for the cases where the renderer must be set to the matching renderer of the exported platform for embedded shaders to be baked. Warn appropriately on the editor.
- [x] Verify there's no regression in BaseMaterial3D being updated automatically in the viewport from a user editing it.
- [x] Find the remaining global shader defines that might be around the codebase from querying the current rendering device's capabilities.
Bugsquad edit: Should fix: https://github.com/godotengine/godot/issues/94734
Contributed by W4 Games. 🍀
This is great!
Will you merge to the main branch, and then I can follow up with a PR to implement Metal support?
If the user is on Windows or macOS, we can utilise the Metal compiler toolchain to generate Metal libraries, reducing load times even more, as that compiles the Metal source into a platform-independent, intermediate format. I notice that Unreal Engine has an option to do this.
Will you merge to the main branch, and then I can follow up with a PR to implement Metal support?
I think it's pretty far from being merged to main at the moment due to 4.4 going into RC soon, I think it'd be best to just PR to this branch as I don't think it'll take too long to adapt what we have to it.
If the user is on Windows or macOS, we can utilise the Metal compiler toolchain to generate Metal libraries, reducing load times even more, as that compiles the Metal source into a platform-independent, intermediate format. I notice that Unreal Engine has an option to do this.
Yes, this would be great. There's a scheme for adding "Platforms" and you can definitely do a Windows-specific version that loads the toolchain if you're under Windows to produce the MIL instead.
Under the new Shader Container design, you won't need to handle anything about serialization of the Shader reflection. All you need is to just convert to the shader binary and you can insert whatever extra bytes you wish to serialize that the platform might need.
Will this feature bake shaders for all backends by default? If yes, can users filter out certain backends out of the export process? Say if developers decide to support Vulkan only on a platform that supports Vulkan and Dx12.
Will this feature bake shaders for all backends by default? If yes, can users filter out certain backends out of the export process? Say if developers decide to support Vulkan only on a platform that supports Vulkan and Dx12.
It bakes the shaders for the driver selected for the platform. It doesn't cover the case at the moment of the user offering options for multiple backends.
One concern I have is for users exporting to Windows from Linux (which is a common scenario on CI). While it should be possible to export SPIR-V already for projects using Vulkan, exporting DXIL for Direct3D doesn't sound feasible right now. None of the D3D12 code is compiled in the Linux editor which is used for exporting on CI. This also applies to users exporting for macOS from other platforms.
Of course, you can sidestep this by using a Windows CI runner, but these are generally slower to perform a full CI run due to slower I/O (and may have higher demand too, leading to increased queues).
More generally, I don't know if this shader compilation process will work in headless anyway (since no GPU is initialized, and none is available on GitHub Actions unless you pay for it).
I suppose we'd need a way to build the NIR stuff regardless of whether Direct3D 12 is enabled in the current build, as long as it's an editor build.
One concern I have is for users exporting to Windows from Linux (which is a common scenario on CI). While it should be possible to export SPIR-V already for projects using Vulkan, exporting DXIL for Direct3D doesn't sound feasible right now. None of the D3D12 code is compiled in the Linux editor which is used for exporting on CI. This also applies to users exporting for macOS from other platforms.
The only D3D12 code that is required at the moment is root signature serialization to a binary blob. If that can be worked around (CC @RandomShaper), then D3D12 is not a requirement for building D3D12 shaders.
More generally, I don't know if this shader compilation process will work in headless anyway (since no GPU is initialized, and none is available on GitHub Actions unless you pay for it).
The shader classes aren't tied to a particular driver running. No GPU is required for the process, as that was part of most of the refactoring that was done to take it out of the drivers and into their own classes that can be used independently.
@Calinou Just brought this PR to my attention! I am super excited to test this out! Please feel free to @ me when this is ready to be tested :)
Would it be possible to schedule this to 4.5? What would be required to do so?
Would it be possible to schedule this to 4.5? What would be required to do so?
Metal's the only component missing as far as I can tell. I can get around to it by the time we enter 4.5 but I'd like to give Stuart time to see if he can manage it as he's more familiar with the driver than I am.
@DarioSamo do you think you could merge to the main branch once 4.4 is release, so I can work from my fork with my build configuration? I will be able to implement it fairly easily from there.
@DarioSamo do you think you could merge to the main branch once 4.4 is release, so I can work from my fork with my build configuration? I will be able to implement it fairly easily from there.
I'm not sure it's possible as I can't figure out a way that isn't very cumbersome to have the current scheme and the new scheme working in tandem without, in the process, just adapting the Metal backend to use the new shader container format and basically ending up with a working shader baker most of the way there already.
@stuartcarnie @DarioSamo
I'm actually confused by both the question and the answer.
do you think you could merge to the main branch once 4.4 is release, so I can work from my fork with my build configuration? I will be able to implement it fairly easily from there.
Aren't all PRs merged into the main branch?
I'm not sure it's possible as I can't figure out a way that isn't very cumbersome to have the current scheme and the new scheme working in tandem without, in the process, just adapting the Metal backend to use the new shader container format and basically ending up with a working shader baker most of the way there already.
Same reason for confusion. Why wouldn't it be possible? Aren't all PRs merged into the main branch? In fact, isn't this PR explicitly requesting to merge into master?
Thank you! :)
Oh I think I see what you are asking now. This branch has merge conflicts. Are you asking if these can be resolved?
@TCROC It's not the merge conflicts, it's the fact that Metal does not build at the moment on this PR. It can't be merged as it breaks the platform. I don't have an easy way to not break it as the changes are fundamental to how the shader methods work.
The amount of work to make it build as a bandaid fix would be roughly equivalent to the amount of work to implement the shader container in Metal that is necessary for shader baking to work.
Ah I see. Thank you for the explanation! :)
👋🏻 @kisg
Overview: Metal
Currently, we use SPIRV-Cross to generate Metal Shader Language (MSL) from the SPIR-V and serialise this source to the binary data. We want to be able to support using the offline Metal compiler toolchain so that we can generate a .metallib file, when the toolchain is available. It isn't required, but will further reduce startup time, as devices such as iOS won't have to execute the Metal Compiler background task to compile the MSL first.
Solution Sketch: Metal
To support MSL and .metallib, we should extend ShaderBinaryData:
https://github.com/godotengine/godot/blob/5312811c4da268892087a88d2b5cdc716f2c219e/drivers/metal/rendering_device_driver_metal.mm#L1557
and a library_type field, that is an enumeration:
enum LibraryType {
METAL_SHADER_LANGUAGE,
METAL_LIBRARY,
}
[!NOTE]
Adding a field will require the version is updated:
https://github.com/godotengine/godot/blob/5312811c4da268892087a88d2b5cdc716f2c219e/drivers/metal/rendering_device_driver_metal.mm#L1076
The remainder of the work is just implementing the container, as @DarioSamo has done for Vulkan and D3D12. Don't worry about implementing offline compilation for your initial PR
Offline compilation
Offline compilation takes the MSL and create a .metallib. See this page for more information.
Future work will add support to spawn the Metal compiler toolchain, which is available for macOS and Window platforms, and generate .metallib files. We can serialise these instead of the raw MSL. Instead of creating a MTLLibrary from source:
https://github.com/godotengine/godot/blob/9fc39ae321ffd8feb7032f090f63e232006a55f6/drivers/metal/metal_objects.mm#L2028-L2044
which results in background compilation, we can use the newLibraryWithData:error: API to load a compiled Metal library.
@DarioSamo when we're baking shaders, do you think it might be possible to provide the parameters required to generate a pipeline state descriptor?
@kisg I suggest you watch this Apple developer video, as it is possible we could provide a 3rd level of compilation, to completely remove runtime compilation. We would need the pipeline descriptor state to achieve this deeper level of customisation, but that would have to come from Godot so we could generate the appropriate JSON descriptor.
FYI:
We have a working Metal implementation of the Shader Baker. It supports both runtime (where we bake the MSL source code) and offline Metal compilation. The offline compilation generates the platform independent bytecode (AIR) format.
In our test application the MSL baking did not make much difference, but with the AIR baking the first startup time went from ~ 7+ seconds to ~2 - 2.5 seconds. The same app with Vulkan (with Shader Baker enabled, so SPIR-V baked in the app) + MoltenVK starts in ~5.1 seconds.
We have to clean it up a bit (currently it only supports iOS targets, no MacOS), but we hope to publish it soon as a PR for this PR.
Brilliant! Great work @kisg! I look forward to testing it out! :)
@kisg Awesome to hear! I'll be glad to review and merge it once it's done!
@kisg awesome to hear – I'll be happy to help review it when ready too!
In our test application the MSL baking did not make much difference, but with the AIR baking the first startup time went from ~ 7+ seconds to ~2 - 2.5 seconds. The same app with Vulkan (with Shader Baker enabled, so SPIR-V baked in the app) + MoltenVK starts in ~5.1 seconds.
@kisg Great stuff! Were you using the LAZY shader initialisation when testing the baking to MSL?
In our test application the MSL baking did not make much difference, but with the AIR baking the first startup time went from ~ 7+ seconds to ~2 - 2.5 seconds. The same app with Vulkan (with Shader Baker enabled, so SPIR-V baked in the app) + MoltenVK starts in ~5.1 seconds.
@kisg Great stuff! Were you using the
LAZYshader initialisation when testing the baking to MSL?
Yes, we have it hardcoded to LAZY for the MSL based cache now. :)
That is going to be a very nice win!
Another feature we might be able to use in the future is MTLBinaryArchive to save compiled pipelines for future use. That is an area I'm going to explore more in the future. I am not sure yet if it will eliminate the requirement to compile the MTLLibrary, as MTLBinaryArchive is specified when creating the Metal pipeline.
FYI:
We have a working Metal implementation of the Shader Baker. It supports both runtime (where we bake the MSL source code) and offline Metal compilation. The offline compilation generates the platform independent bytecode (AIR) format.
In our test application the MSL baking did not make much difference, but with the AIR baking the first startup time went from ~ 7+ seconds to ~2 - 2.5 seconds. The same app with Vulkan (with Shader Baker enabled, so SPIR-V baked in the app) + MoltenVK starts in ~5.1 seconds.
We have to clean it up a bit (currently it only supports iOS targets, no MacOS), but we hope to publish it soon as a PR for this PR.
Hi @kisg - what's the status on your Metal implementation? We'd like to get the shader baker merge sooner than later in the dev branch for 4.5, so we don't risk missing the merge window, and get enough testing before the stable release.
Hi @kisg - what's the status on your Metal implementation? We'd like to get the shader baker merge sooner than later in the dev branch for 4.5, so we don't risk missing the merge window, and get enough testing before the stable release.
We will provide a PR for this PR this week; sorry for the delay.
@DarioSamo Would it be possible to rebase this PR on the current master before I create the Metal PR?
@DarioSamo Would it be possible to rebase this PR on the current master before I create the Metal PR?
It might take some work but I'll see to get it done this week if possible.
@kisg Rebased on top of the latest master.
@kisg Now that it is rebased, what is your timeline for making a PR?