dxvk icon indicating copy to clipboard operation
dxvk copied to clipboard

Optimize SPIRV shader compilation time

Open rbernon opened this issue 4 years ago • 31 comments

While working on Dishonored 2 stalls I spent some time optimizing the SPIRV shader compilation and ended up with this. It's not solving anything with the game stalls (which are because of new pipeline compilation during gameplay), but I could measure the game initial loading time going from ~10s to ~6s.

Feel free to take it or reject, I'd completely understand if that's not a super high priority as shader compilation is slow anyway and only supposed to happen during loading times.

rbernon avatar Apr 20 '20 08:04 rbernon

I haven't really had the time to take a closer look at this yet, but users report that it also helps Warframe a bit. Not high priority by any means, but interesting regardless.

doitsujin avatar Apr 27 '20 09:04 doitsujin

Sure, in any case loading times are also constrained by wine heap performance, which is currently really bad. I have some wip patches to improve that but it's completely experimental (and quite unrelated).

rbernon avatar Apr 27 '20 09:04 rbernon

I'll report that it breaks Star Citizen (crashing on the universe sandbox loading screen) with no log or anything useful to debug the problem attached 🐸 I guess I could give vk validation layer a shot.

Tk-Glitch avatar May 01 '20 22:05 Tk-Glitch

I'll report that it breaks Star Citizen (crashing on the universe sandbox loading screen) with no log or anything useful to debug the problem attached 🐸 I guess I could give vk validation layer a shot.

Does it crash with the fix?

oltolm avatar May 02 '20 07:05 oltolm

Just updated with the typo corrected.

rbernon avatar May 02 '20 07:05 rbernon

I removed a remaining ID allocation and the shaders are now generated identically as far as I can tell. I feel surprising that it would fix the SC crash though.

rbernon avatar May 08 '20 10:05 rbernon

@Tk-Glitch could you please retest it after the last changes?

jqadev avatar May 14 '20 00:05 jqadev

Just tested. The crash is still there on SC when that PR is applied even after the recent changes.

On another note, GCC10 fails to compile DXVK with this PR:

[71/266] Compiling C++ object 'src/spirv/e922df6@@spirv@sta/spirv_module.cpp.obj'
FAILED: src/spirv/e922df6@@spirv@sta/spirv_module.cpp.obj 
ccache x86_64-w64-mingw32-g++ -Isrc/spirv/e922df6@@spirv@sta -Isrc/spirv -I../../../dxvk-master/src/spirv -I../../../dxvk-master/include -fdiagnostics-color=always -pipe -Wall -Winvalid-pch -Wnon-virtual-dtor -std=c++17 -O3 -DNOMINMAX -MD -MQ 'src/spirv/e922df6@@spirv@sta/spirv_module.cpp.obj' -MF 'src/spirv/e922df6@@spirv@sta/spirv_module.cpp.obj.d' -o 'src/spirv/e922df6@@spirv@sta/spirv_module.cpp.obj' -c ../../../dxvk-master/src/spirv/spirv_module.cpp
../../../dxvk-master/src/spirv/spirv_module.cpp: In member function ‘uint32_t dxvk::SpirvModule::defIntType(uint32_t, uint32_t)’:
../../../dxvk-master/src/spirv/spirv_module.cpp:699:11: error: narrowing conversion of ‘-64’ from ‘int’ to ‘unsigned int’ [-Wnarrowing]
  699 |     case -64:
      |           ^~
../../../dxvk-master/src/spirv/spirv_module.cpp:702:11: error: narrowing conversion of ‘-32’ from ‘int’ to ‘unsigned int’ [-Wnarrowing]
  702 |     case -32:
      |           ^~
../../../dxvk-master/src/spirv/spirv_module.cpp:705:11: error: narrowing conversion of ‘-16’ from ‘int’ to ‘unsigned int’ [-Wnarrowing]
  705 |     case -16:
      |           ^~
../../../dxvk-master/src/spirv/spirv_module.cpp:708:11: error: narrowing conversion of ‘-8’ from ‘int’ to ‘unsigned int’ [-Wnarrowing]
  708 |     case -8:
      |           ^
[84/266] Compiling C++ object 'src/dxvk/1752c3e@@dxvk@sta/dxvk_context.cpp.obj'
ninja: build stopped: subcommand failed.

While GCC9.3.1 will output the following warning:

[147/266] Compiling C++ object 'src/dxvk/1752c3e@@dxvk@sta/dxvk_shader.cpp.obj'
In file included from ../../../dxvk-master/src/dxvk/dxvk_shader.h:10,
                 from ../../../dxvk-master/src/dxvk/dxvk_shader.cpp:1:
../../../dxvk-master/src/dxvk/../spirv/spirv_writer.h: In instantiation of ‘void dxvk::SpirvWriter<SpirvBuffer>::erase(size_t) [with SpirvBuffer = dxvk::SpirvCodeBuffer; size_t = long long unsigned int]’:
../../../dxvk-master/src/dxvk/dxvk_shader.cpp:354:17:   required from here
../../../dxvk-master/src/dxvk/../spirv/spirv_writer.h:183:17: warning: comparison of integer expressions of different signedness: ‘size_t’ {aka ‘long long unsigned int’} and ‘int’ [-Wsign-compare]
  183 |       if (m_ptr == -1)
      |           ~~~~~~^~~~~
[261/266] Linking target src/d3d10/d3d10core.dll

Tk-Glitch avatar May 14 '20 20:05 Tk-Glitch

Crashes Dark Souls 3 on Windows.

[0x0]   ntdll!NtWaitForMultipleObjects + 0x14   
[0x1]   KERNELBASE!WaitForMultipleObjectsEx + 0x107   
[0x2]   KERNELBASE!WaitForMultipleObjects + 0xe   
[0x3]   kernel32!WerpReportFaultInternal + 0x51b   
[0x4]   kernel32!WerpReportFault + 0xac   
[0x5]   KERNELBASE!UnhandledExceptionFilter + 0x3b8   
[0x6]   ntdll!RtlUserThreadStart$filt$0 + 0xa2   
[0x7]   ntdll!_C_specific_handler + 0x96   
[0x8]   ntdll!RtlpExecuteHandlerForException + 0xf   
[0x9]   ntdll!RtlDispatchException + 0x219   
[0xa]   ntdll!KiUserExceptionDispatch + 0x2e   
[0xb]   amdvlk64!vk_icdGetInstanceProcAddrSG + 0xfb00   
[0xc]   amdvlk64!vk_icdGetInstanceProcAddrSG + 0xcebb   
[0xd]   amdvlk64!vk_icdGetInstanceProcAddrSG + 0x5333   
[0xe]   amdvlk64!vk_icdGetInstanceProcAddrSG + 0x6b4f   
[0xf]   amdvlk64!vk_icdNegotiateLoaderICDInterfaceVersion + 0x615bf   
[0x10]   amdvlk64!vk_icdNegotiateLoaderICDInterfaceVersion + 0x4a603   
[0x11]   amdvlk64!vk_icdNegotiateLoaderICDInterfaceVersion + 0x50bfd   
[0x12]   amdvlk64!vk_icdNegotiateLoaderICDInterfaceVersion + 0x146cc   
[0x13]   VkLayer_steam_fossilize64 + 0x2b4e8   
[0x14]   d3d11!createPipeline + 0xe21   
[0x15]   d3d11!createInstance + 0x39   
[0x16]   d3d11!compilePipeline + 0x68   
[0x17]   d3d11!compilePipelines + 0x339   
[0x18]   d3d11!workerFunc + 0x3cd   
[0x19]   d3d11!threadProc + 0x1f   
[0x1a]   kernel32!BaseThreadInitThunk + 0x14   
[0x1b]   ntdll!RtlUserThreadStart + 0x21   

in dxvk_graphics.cpp in line 405:

    if (m_vkd->vkCreateGraphicsPipelines(m_vkd->device(),
          m_pipeMgr->m_cache->handle(), 1, &info, nullptr, &pipeline) != VK_SUCCESS) {

oltolm avatar May 15 '20 10:05 oltolm

Thanks, I can probably try Dark Souls 3. Does it crash early? Only on Windows?

rbernon avatar May 15 '20 10:05 rbernon

It crashes immediately after start. You probably need my dxvk-cache file. I only tested on Windows.

oltolm avatar May 15 '20 11:05 oltolm

I can reproduce the crash, I'll have a look thanks.

rbernon avatar May 15 '20 12:05 rbernon

I was a bit greedy and tried to put too much in the SpirvWriter commit. I've rewritten that and split the commit and it now works fine with Dark Souls III.

rbernon avatar May 15 '20 18:05 rbernon

That also fixed the SC issue 🎉

Tk-Glitch avatar May 15 '20 19:05 Tk-Glitch

I've been building DXVK with this PR included for over a month now and so far no issue has occurred in any application (If issues occurred, I of course also checked without this PR.). Though I must admit that I haven't yet perceived a substantial improvement in real world shader compile times either, but that might be just my subjective impression.

aufkrawall avatar Jul 22 '20 10:07 aufkrawall

I have also been building and using dxvk with this PR for over a months and have not run into any issues related to the PR.

JustinSpedding avatar Aug 17 '20 19:08 JustinSpedding

I tried this PR on top of master, seems to work just fine.

imaami avatar Sep 22 '20 14:09 imaami

Thought I'd give an update. I've had this PR as part of my local DXVK build since I left my previous comment, so about 1.5 months now. I haven't experienced any problems, and Squad does start up faster than with an unpatched build.

imaami avatar Nov 02 '20 23:11 imaami

I'm just refreshing this, rebasing the branch and addressing the comment above. It now uses bit::cast instead of direct memcpy, and the additional DxvkShader constructor is less verbose, passing a const SpirvModule& and building flags directly from it instead.

rbernon avatar Nov 03 '20 13:11 rbernon

https://github.com/doitsujin/dxvk/issues/1913

This PR became borkened by mentioned commit :crying_cat_face:

Valmar33 avatar Jan 31 '21 04:01 Valmar33

Rebased. It would be nice though, if the PR is considered useful as it seems, to have it merged. I'm happy to make changes too, just tell me.

rbernon avatar Jan 31 '21 11:01 rbernon

I've been using dxvk with this newer rebased version of this PR for a while now. So far no issues at all, and it does seems to load a bit faster. It would be cool if this would get merged.

JustinSpedding avatar May 29 '21 02:05 JustinSpedding

@JustinSpedding I have had some strange issues (crashing, not able to create D3D adapter) when using this patchset and compiling on Ubuntu. If i add -march=native i do not seem to have issues with it tho, but this is not the case when compiling using arch in a docker image... then again, if i use -march=native in docker with the arch image, it seems to produce borked binaries.

Not to go too much into details about this, my point being that it MAY cause strange issues with some distro's.

This could be AVX related, as debian/ubuntu uses some distro spesific AVX patches that arch might not use - https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=939559

I have only had these issues when using this particular patchset, so it may be "unsafe" in that regard, and i have not studied steam-docker-ubuntu-custom-mingw stuff enough to figure out if that could end up to be a problem there if it gets upstreamed. As i said, compiling this in a arch docker-image does seem to work just fine, but afaik steam uses a custom debian image (correct me if i am totally off here tho).

SveSop avatar Jun 02 '21 07:06 SveSop

What is the status on this MR? I found it quite useful, but now it has merge conflicts with the master branch. idk if anyone cares to rebase it or figure out the potential non-debian distro issue that SveSop mentioned earlier.

JustinSpedding avatar Apr 30 '22 19:04 JustinSpedding

Is this PR still applicable to DXVK 2.x? Currently using DXVK to play Overwatch 2 and I'd like not having to wait as long every time I start the game.

ghost avatar Feb 12 '23 18:02 ghost

Rebased the changes, as I think it might still be relevant. For instance, Overwatch 2 shader compilation is faster and stutters a lot less with these changes.

The main improvement comes from the reduced number of allocations, although this is now also mitigated with Low Fragmentation Heap being upstream in Wine.

rbernon avatar Feb 23 '23 12:02 rbernon

I needed to add #include <optional> in src/spirv/spirv_module.h to get it to compile on my machine, but otherwise, it works.

ghost avatar Feb 24 '23 13:02 ghost

I also looked at the GitHub checks, #include <optional> should fix it.

ghost avatar Mar 05 '23 14:03 ghost

I've tried to revisit this now that GPL caching is in RADV and SPIR-V translation would be the only uncacheable thing in the pipeline. Insights below are from OW2 which happens to be a easy test case, with lots of shaders (78K) loaded on startup.

The defType and defConst changes are definitely the most impactful ones (at around 30%) — they're called all the time and linear search complexity is getting out of control. For the design, I wonder if always using a hash map would be better:

  • Pro: The linear search code path can be removed
  • Pro: No need to separately declare specialized cache for each type
  • Con: Searching might be annoying (C++20 has heterogeneous lookup which helps with this)
  • Con: Maybe higher constant overhead

I've also looked into compression: it's less impactful than the type / const changes, as only around 7% of time is spent there. I've played with https://github.com/lemire/streamvbyte which I think is well suited for SPIR-V because:

  • Control bytes are stored in a separate stream, and each word can be allocated 8/16/24/32 bits. VarInt has the problem of being 7/14/21/28/35 and not matching certain usage patterns well. Compression ratio seems to be higher (53% -> 50%)
  • It has a SIMD implementation and it's really fast to encode and decode.

SVB brings the compression overhead down to 3%, so maybe it's worth pursuiting, with the compression ratio improvement also considered.

ishitatsuyuki avatar Apr 07 '23 06:04 ishitatsuyuki

any updates on this?

LethalManBoob avatar May 04 '23 19:05 LethalManBoob