Daemon
Daemon copied to clipboard
GLSL caching bugs are still with us - and they've only grown stronger
Master looks like this for me on the main menu. If I set r_glslCache 0 it works.
Can't reproduce.
I tried deleting glsl cache, running 0.55.2, then running master again - still no issues.
0.55.2 doesn't even use the same caching scheme, so it probably wouldn't cause a bug. We need to test different branches since the recent caching rewrite
Which system, hardware and driver?
It may also be useful to know the cvar configuration, because maybe this is only triggered with a specific set of options.
I got this with Nvidia GTX 1070 on Windows. I don't know what the cvars were or how to reproduce it.
I may have gotten something similar in the render with temporary commits when debugging https://github.com/DaemonEngine/Daemon/pull/1613
The screen was pitch black on the left part, and white on the right part, perfect vertical division, not a square on top right.
I believed that was my patch that was wrong and added more patches above it. But later when I checked out the old patch, it rendered fine. It's also possible that my patch wasn't the same anymore, but I got surprised that reverting to what I believed to be an older version of my code didn't produce the same thing.
@VReaperV do we invalidate the cache if:
- the GLSL code is exactly the same one
- the driver is exactly the same one, with same identification strings (same hardware, etc.)
- but GL/GLSL versions differ?
- or extensions may differ?
That's a good question. I never thought of the possibility that the program binary could be different due to changing which GLSL extensions are enabled.
Besides the possibility of the compiler backend outputting different stuff based on extensions, as I was imagining in the previous comment, there's also an easy way to break it. We use the __VERSION__ builtin macro in several shaders, so if we change GLSL version the source code is effectively different for those.
But then __VERSION__ isn't part of the source I guess.
I mean for the stuff like
#if __VERSION__ > 120
out vec4 outputColor;
#else
#define outputColor gl_FragColor
#endif
We can get two different preprocessed source codes depending on which GLSL version is being used.
But then the GLSL version must be set by the #version directive. So if __VERSION__ is always just echoing back the version directive, there is no issue with the same source preprocessing differently.
Anyway this version stuff is a distraction; the bug, in my case at least, clearly was not caused by that.
- but GL/GLSL versions differ?
For GLSL versions - yes, and they follow from the GL version.
- or extensions may differ?
Yes, as long as they are used in shaders (i. e. any that are present in the version header).
We use the
__VERSION__builtin macro in several shaders, so if we change GLSL version the source code is effectively different for those.
That won't break, because GLSL version is always present on the first line.
Yes I noticed that we have some #version * string containing the GLSL version that is appended to the source, so that will invalidate the build if the version changes. We also have some HAVE_* definitions based on extensions we explicitly check, but I wonder if some implicit extensions may change the produced bitcode.
For the GL version, I guess it's not relevant for compiling GLSL.
We also have some
HAVE_*definitions based on extensions we explicitly check, but I wonder if some implicit extensions may change the produced bitcode.
It probably shouldn't since those would just be part of the given GLSL version, but who knows.
Yes I noticed that we have some
#version *string containing the GLSL version that is appended to the source, so that will invalidate the build if the version changes.
Yes, unlike before the hashing is now done based on the full shader program text, and it never changes afterwards.
Master looks like this for me on the main menu. If I set
r_glslCache 0it works.
This looks more like the results I've gotten before due to GL errors.
This looks more like the results I've gotten before due to GL errors.
What do you mean, what kind of errors?
This looks more like the results I've gotten before due to GL errors.
What do you mean, what kind of errors?
Ones that would get caught by GL_CheckErrors(), though I don't remember which exact gl call was producing them.
It's the bitness. If you run 32-bit Daemon it poisons the cache for 64-bit Daemon and vice versa.
Perhaps this bug was already there but I never noticed it because my laptop used to default to the Intel GPU (before the change to add the magic symbol for Optimus). I may not have bothered to test the combination 32-bit + Nvidia.
This does produce a GL_INVALID_OPERATION error in GLShaderManager::BuildPermutation so we could potentially catch the failures to load a GL program from cache.
It's the bitness. If you run 32-bit Daemon it poisons the cache for 64-bit Daemon and vice versa.
Perhaps this bug was already there but I never noticed it because my laptop used to default to the Intel GPU (before the change to add the magic symbol for Optimus). I may not have bothered to test the combination 32-bit + Nvidia.
This does produce a GL_INVALID_OPERATION error in
GLShaderManager::BuildPermutationso we could potentially catch the failures to load a GL program from cache.
Hmm, sounds like this has to do with bindless textures, since they use 64-bit handles. it might also explain why everything is black without material system: it puts the texture handles into a buffer, while the regular codepath just sets them as uniforms.
Perhaps there should just be a value in the shader cache header to indicate whether it's 32-bit or 64-bit?
It's the bitness. If you run 32-bit Daemon it poisons the cache for 64-bit Daemon and vice versa.
Wow, nice find!
Perhaps there should just be a value in the shader cache header to indicate whether it's 32-bit or 64-bit?
Sometimes the most naive way is the best way.
I also wonder if endianness can produce the same problems too, though we have no different-endian targets so that's less of a concern.
Bitness is not the only way. I get the same thing with the following procedure on a Windows Optimus machine:
- Delete
glslcache from homepath - Start Unvanquished 0.55.3 with Nvidia graphics
- Start Unvanquished 0.55.3 with Intel graphics
Didn't we used to have a hash of the driver string in the cache header? Did that get broken somehow?
The Nvidia vs. Intel issue, at least, is a regression in ab59550f23b41ea80b2443c72c9ce35fce1a7cb4. This commit comments out the driver sameness check.