OpenGothic icon indicating copy to clipboard operation
OpenGothic copied to clipboard

Crash unless meshlets disabled

Open hwnde opened this issue 1 year ago • 14 comments

Windows 11, fresh install, commit 26b816e36d05832f90851d2924209f14e6d88050 Game crashes when starting a new game or loading an existing save. Disabling meshlets resolves the issue. crash.log

hwnde avatar Feb 14 '24 07:02 hwnde

https://github.com/Try/OpenGothic/issues/451#issuecomment-1937670689

Lemme guess... AMD graphics card? (And I answer it myself: AMD Radeon RX 6800 XT)

dreimer1986 avatar Feb 14 '24 07:02 dreimer1986

AMD commercial driver has fully broken mesh-shading unfortunately. If you have time - please send bug-reports to them, as we need a driver fix.

Try avatar Feb 14 '24 23:02 Try

Could you elaborate on what the issue is? I couldn't find anything in Google. Also what is the consequence of disabling meshlets? Are there any visual differences or it's only a performance optimization?

hwnde avatar Feb 15 '24 05:02 hwnde

AMD commercial driver has fully broken mesh-shading unfortunately. If you have time - please send bug-reports to them, as we need a driver fix.

AMD seems to no care much about these reports though. We sent one already. I think they disagree with that being their fault, even though OpenGothic runs on Intel and NVidia cards just fine.

dreimer1986 avatar Feb 15 '24 09:02 dreimer1986

I think they disagree with that being their fault, even though...

Even though OpenGothic runs on same hardware with opensource driver just fine :)

Could you elaborate on what the issue is? Also what is the consequence of disabling meshlets?

Consequence is performance. In Vulkan or DX12 or any other graphics api, there is a base spec with basic functionality and extensions. In OpenGothic engine checks for mesh-shader extension, and uses it when possible to draw mesh-clusters. If no mesh-shader game can still run, with fallback to triangle-lists. And triangle list is part of base spec - so should work. Reason to no use triangle-list is total lack of vertex-reuse, what forces engine to do 3-times more work in geometry. Another problem is occupancy. On NVidia triangle-list, capt to 93% utilization rate in theory... in tests is as bad as 6%

On a bug side - pretty certain that it's AMD bug. When engine calls to vkCreateGraphicsPipelines, it doesn't compile shader, and doesn't return en error code - it hard crashes deep in amdvlk64.dll.

Try avatar Feb 15 '24 20:02 Try

I can try to debug the issue to figure out what exactly causes it and whether the shader can be modified to compile. But I don't see any instructions for building and debugging the project in Windows. Does anyone have experience with it?

hwnde avatar Feb 19 '24 04:02 hwnde

MSVC 2022 and the project downloaded off a private fork. There were some dependencies to fulfill, but that was a while ago I had to do that. Try these easy steps first and then we will see what's missing.

Likely you will need:

  • Installed CMAKE: https://github.com/Kitware/CMake/releases/download/v3.28.3/cmake-3.28.3-windows-x86_64.msi
  • Installed VulkanSDK: https://sdk.lunarg.com/sdk/download/1.3.275.0/windows/VulkanSDK-1.3.275.0-Installer.exe
  • Env Var like VULKAN_SDK = "C:/VulkanSDK/1.3.275.0" or similar
  • glslang.zip in some reachable location: https://github.com/KhronosGroup/glslang/releases/download/master-tot/glslang-master-windows-x64-Release.zip
  • Path Env Var needs: path\to\glslang\bin

Likely I forgot something, but this is a good start to minimize the problems.

dreimer1986 avatar Feb 19 '24 08:02 dreimer1986

debugging the project in Windows. Does anyone have experience with it?

cmake + VulkanSDK + windows-sdk should be enough. No need to install glslang or set environment variables - VulkanSDK installer will take care of it.

CMakeLists.txt can be opened with Visual-Studio (open folder) and should work just fine. If there is any issues with bring up - let me know.

Shader that you interested in is main.vert - it has multiple variants, based on geometry-type and other factors.

Try avatar Feb 19 '24 18:02 Try

I've got some results. The culprit is this line: shOut[laneID] = var;

I managed to get the shader to work by replacing it with this:

#if defined(MAT_COLOR)
    shOut[laneID].color = var.color;
#endif
#if defined(MAT_UV)
    shOut[laneID].uv = var.uv;
#endif
#if defined(MAT_POSITION)
    shOut[laneID].pos.x = var.pos.x;
    shOut[laneID].pos.y = var.pos.y;
    shOut[laneID].pos.z = var.pos.z;
#endif
#if defined(MAT_NORMAL)
    shOut[laneID].normal.x = var.normal.x;
    shOut[laneID].normal.y = var.normal.y;
    shOut[laneID].normal.z = var.normal.z;
#endif

Assigning pos or normal directly shOut[laneID].pos = var.pos; results in a crash. I don't know why. I guess for some reason it doesn't like vec3 since color and uv work correctly. I can create a PR with this solution unless there are better ideas.

hwnde avatar Feb 20 '24 10:02 hwnde

Thanks @hwnde , for investigating it!

I can create a PR with this solution unless there are better ideas.

This is an unease split:

  • with new found evidence we can say for sure, that this is a 100% driver bug.
  • it's not realistic to carry not only API-workarounds, but also a WA for all driver bugs

My suggestion here is to report this issue again, to AMD: someone who owns their gpu can use https://www.amd.com/en/support/kb/faq/amdbrt# In report it's important to make clear that problem scope is not a game, but rather a small part of the shader. It seems they are unable to unroll varying vec3 assignment properly. float/vec2/vec4 are safe, likely due to power of two size.

I'm not really aware, if they have other way of bug reporting specifically for game-developers, who does not own their gpu.

Secondary option: disable mesh-shader on AMD by default - call it broken. Two major downsides here: I'm not happy to have conditionals with HW-vendor names, and we will ruin performance for MESA users.

Try avatar Feb 20 '24 18:02 Try

I'm not really aware, if they have other way of bug reporting specifically for game-developers, who does not own their gpu.

I think this is where such issues are reported: https://community.amd.com/t5/opengl-vulkan/bd-p/opengl-and-vulkan-discussions

I can post an issue and see if AMD responds. However it's still an open question what to do with this issue. There is no guarantee that AMD will do a fix any time soon or ever. I'd argue that doing the shader workaround is the approach with least downsides right now. First of all, although it addresses an issue with specific driver, the code itself will work with any driver and any vendor, we don't have to do any vendor detection here. On the other hand disabling mesh shader on AMD will require vendor detection and also will deprive AMD users on Windows (which are the vast majority of all AMD users) from performance benefits that come with mesh shaders.

hwnde avatar Feb 21 '24 00:02 hwnde

Started a post on AMD for forum: https://community.amd.com/t5/newcomers-start-here/driver-crash-when-using-vk-ext-mesh-shader/m-p/667490#M2963

Feel, free to join discussion :)

I'd argue that doing the shader workaround is the approach with least downsides right now.

I'm still worry, about maintaining it, since shader code is always something to edit. Let, see how fast they will respond to the report; Maybe merging fix + add a comment with URL to community.amd.com + comment when it's should be safe to remove workaround is the way

Try avatar Feb 21 '24 20:02 Try

I'm an OpenMW graphics developer who's been sent a link to this, and have a couple of recommendations:

  • Get lots of users to send reports of problems via the bug reporting tool (little insect icon) in the Radeon Software.
  • Get at least one of your team to join AMD's Vanguard program. This makes it much easier to get in contact with AMD staff, and much easier to get your complaints forwarded to engineers. If you want, I can send someone a message about your application, which should make it more likely to get accepted quickly.

AnyOldName3 avatar Feb 25 '24 01:02 AnyOldName3

Hi, @AnyOldName3 and thanks for recommendation!

Get at least one of your team to join AMD's Vanguard program.

That would be nice. Unfortunately can't take opportunity myself - due to lack of AMD gpu. Maybe @hwnde or @dreimer1986 are interested?

And one more option: sneak a test to Vulkan-CTS, that uses mesh-shader + vec3 :D

Try avatar Feb 25 '24 18:02 Try