Daemon icon indicating copy to clipboard operation
Daemon copied to clipboard

GLSL caching bugs are still with us - and they've only grown stronger

Open slipher opened this issue 8 months ago • 26 comments

Master looks like this for me on the main menu. If I set r_glslCache 0 it works.

Image

slipher avatar Mar 16 '25 09:03 slipher

Can't reproduce.

VReaperV avatar Mar 16 '25 10:03 VReaperV

I tried deleting glsl cache, running 0.55.2, then running master again - still no issues.

VReaperV avatar Mar 16 '25 11:03 VReaperV

0.55.2 doesn't even use the same caching scheme, so it probably wouldn't cause a bug. We need to test different branches since the recent caching rewrite

slipher avatar Mar 18 '25 07:03 slipher

Which system, hardware and driver?

illwieckz avatar Mar 18 '25 14:03 illwieckz

It may also be useful to know the cvar configuration, because maybe this is only triggered with a specific set of options.

illwieckz avatar Mar 18 '25 14:03 illwieckz

I got this with Nvidia GTX 1070 on Windows. I don't know what the cvars were or how to reproduce it.

slipher avatar Mar 24 '25 17:03 slipher

I may have gotten something similar in the render with temporary commits when debugging https://github.com/DaemonEngine/Daemon/pull/1613

The screen was pitch black on the left part, and white on the right part, perfect vertical division, not a square on top right.

I believed that was my patch that was wrong and added more patches above it. But later when I checked out the old patch, it rendered fine. It's also possible that my patch wasn't the same anymore, but I got surprised that reverting to what I believed to be an older version of my code didn't produce the same thing.

illwieckz avatar Mar 24 '25 17:03 illwieckz

@VReaperV do we invalidate the cache if:

  • the GLSL code is exactly the same one
  • the driver is exactly the same one, with same identification strings (same hardware, etc.)
  • but GL/GLSL versions differ?
  • or extensions may differ?

illwieckz avatar Mar 24 '25 17:03 illwieckz

That's a good question. I never thought of the possibility that the program binary could be different due to changing which GLSL extensions are enabled.

slipher avatar Mar 24 '25 17:03 slipher

Besides the possibility of the compiler backend outputting different stuff based on extensions, as I was imagining in the previous comment, there's also an easy way to break it. We use the __VERSION__ builtin macro in several shaders, so if we change GLSL version the source code is effectively different for those.

slipher avatar Mar 24 '25 18:03 slipher

But then __VERSION__ isn't part of the source I guess.

illwieckz avatar Mar 24 '25 18:03 illwieckz

I mean for the stuff like

#if __VERSION__ > 120
out vec4 outputColor;
#else
#define outputColor gl_FragColor
#endif

We can get two different preprocessed source codes depending on which GLSL version is being used.

But then the GLSL version must be set by the #version directive. So if __VERSION__ is always just echoing back the version directive, there is no issue with the same source preprocessing differently.

Anyway this version stuff is a distraction; the bug, in my case at least, clearly was not caused by that.

slipher avatar Mar 24 '25 18:03 slipher

  • but GL/GLSL versions differ?

For GLSL versions - yes, and they follow from the GL version.

  • or extensions may differ?

Yes, as long as they are used in shaders (i. e. any that are present in the version header).

VReaperV avatar Mar 24 '25 19:03 VReaperV

We use the __VERSION__ builtin macro in several shaders, so if we change GLSL version the source code is effectively different for those.

That won't break, because GLSL version is always present on the first line.

VReaperV avatar Mar 24 '25 19:03 VReaperV

Yes I noticed that we have some #version * string containing the GLSL version that is appended to the source, so that will invalidate the build if the version changes. We also have some HAVE_* definitions based on extensions we explicitly check, but I wonder if some implicit extensions may change the produced bitcode.

For the GL version, I guess it's not relevant for compiling GLSL.

illwieckz avatar Mar 24 '25 19:03 illwieckz

We also have some HAVE_* definitions based on extensions we explicitly check, but I wonder if some implicit extensions may change the produced bitcode.

It probably shouldn't since those would just be part of the given GLSL version, but who knows.

VReaperV avatar Mar 24 '25 19:03 VReaperV

Yes I noticed that we have some #version * string containing the GLSL version that is appended to the source, so that will invalidate the build if the version changes.

Yes, unlike before the hashing is now done based on the full shader program text, and it never changes afterwards.

VReaperV avatar Mar 24 '25 19:03 VReaperV

Master looks like this for me on the main menu. If I set r_glslCache 0 it works.

Image

This looks more like the results I've gotten before due to GL errors.

VReaperV avatar Mar 24 '25 19:03 VReaperV

This looks more like the results I've gotten before due to GL errors.

What do you mean, what kind of errors?

slipher avatar Mar 25 '25 23:03 slipher

This looks more like the results I've gotten before due to GL errors.

What do you mean, what kind of errors?

Ones that would get caught by GL_CheckErrors(), though I don't remember which exact gl call was producing them.

VReaperV avatar Mar 25 '25 23:03 VReaperV

It's the bitness. If you run 32-bit Daemon it poisons the cache for 64-bit Daemon and vice versa.

Perhaps this bug was already there but I never noticed it because my laptop used to default to the Intel GPU (before the change to add the magic symbol for Optimus). I may not have bothered to test the combination 32-bit + Nvidia.

This does produce a GL_INVALID_OPERATION error in GLShaderManager::BuildPermutation so we could potentially catch the failures to load a GL program from cache.

slipher avatar Apr 01 '25 03:04 slipher

It's the bitness. If you run 32-bit Daemon it poisons the cache for 64-bit Daemon and vice versa.

Perhaps this bug was already there but I never noticed it because my laptop used to default to the Intel GPU (before the change to add the magic symbol for Optimus). I may not have bothered to test the combination 32-bit + Nvidia.

This does produce a GL_INVALID_OPERATION error in GLShaderManager::BuildPermutation so we could potentially catch the failures to load a GL program from cache.

Hmm, sounds like this has to do with bindless textures, since they use 64-bit handles. it might also explain why everything is black without material system: it puts the texture handles into a buffer, while the regular codepath just sets them as uniforms.

VReaperV avatar Apr 01 '25 11:04 VReaperV

Perhaps there should just be a value in the shader cache header to indicate whether it's 32-bit or 64-bit?

VReaperV avatar Apr 01 '25 11:04 VReaperV

It's the bitness. If you run 32-bit Daemon it poisons the cache for 64-bit Daemon and vice versa.

Wow, nice find!

Perhaps there should just be a value in the shader cache header to indicate whether it's 32-bit or 64-bit?

Sometimes the most naive way is the best way.

I also wonder if endianness can produce the same problems too, though we have no different-endian targets so that's less of a concern.

illwieckz avatar Apr 01 '25 15:04 illwieckz

Bitness is not the only way. I get the same thing with the following procedure on a Windows Optimus machine:

  1. Delete glsl cache from homepath
  2. Start Unvanquished 0.55.3 with Nvidia graphics
  3. Start Unvanquished 0.55.3 with Intel graphics

Didn't we used to have a hash of the driver string in the cache header? Did that get broken somehow?

slipher avatar Apr 29 '25 09:04 slipher

The Nvidia vs. Intel issue, at least, is a regression in ab59550f23b41ea80b2443c72c9ce35fce1a7cb4. This commit comments out the driver sameness check.

slipher avatar Apr 29 '25 12:04 slipher