Supermodel
Supermodel copied to clipboard
Driver timeout on AMD 5700U
Hi!
Using Windows 11, an AMD 5700U (=integrated graphics) and latest official AMD drivers, Supermodel runs into a driver timeout pretty consistently, e.g. when selecting the race car in Daytona2PE (or early during racing in Scud Race).
This seems to be the case since at least the super sampling was introduced (note: happens also though if using supersampling = 1), but could also be some commits earlier on, i still need to do a full binary search. Does not happen with older builds, even when playing for a long time.
Hi toxieainc, if you could figure out at which commit this started happening that would help a lot. Unfortunately I don't even have an AMD card to test with. I know in windows there is a driver timeout value that you can edit in the registry. If you increase this time?
https://answers.microsoft.com/en-us/windows/forum/all/increase-time-out-limit/e979e2ad-e15f-450b-9818-a148cbf01078
does it allow the game to run at least? Maybe something in the shader is is causing it to recompile eating up time.
If you are savvy with the code there try uncommenting these
//glDebugMessageCallback(DebugCallback, NULL); //glDebugMessageControl(GL_DONT_CARE,GL_DONT_CARE,GL_DONT_CARE, 0, 0, GL_TRUE); //glEnable(GL_DEBUG_OUTPUT);
it should give some detailed driver output
From my understanding, a recompile should be handled differently in the OS than the GPU not responding (as the latter prevents the screen to update, etc, while the first 'just' blocks some CPU thread(s)).
Will debug/find the commit when i'm back at that system (will take some weeks though :/).
Well opengl commands are essentially issued on a single thread (the current context). If swapbuffers takes too long, I think it's usually a second or two, windows assumes it's died and kills it. I know this can happen if you render very large datasets and it simply takes too long. But really it shouldn't happen in normal rendering. Uniforms are essentially constants per draw call, I know some vendors will optimise the shaders and essentially recompile them based upon different inputs. I dont know if this is happening here it's just speculation.
Really the best option is to enable those debug options and the driver will hopefully tell us what the issue is.
quad rendering?
No, these changes/commits all worked fine, and also some months after that.
As said, will know more when i have access to the system again later-on.
Using Windows 10 22H2, an AMD 7840HS (=integrated graphics) and latest official AMD drivers, whql-amd-software-adrenalin-edition-24.7.1, use https://github.com/trzy/Supermodel/commit/dd90d0e2e0ae2a8f05f16f1780d30d7d58f87f5d testing Daytona2PE and Scud Race about 2 hours,everything works fine, no issues found.
my ini setting: QuadRendering = true WideScreen = true Stretch = false WideBackground = true XResolution =2560 YResolution =1440 FullScreen =1 RefreshRate = 57.524 LegacySoundDSP = false
Finally back at the setup, found the commit in question: 6f40953 (2023-12-04) works 33b84c8 (2023-12-22) doesn't
It might be the depth stencil format it's float 32 but with 8 bit stencil. Has 24 bit padding so it actually works out as a 64 bit type. Can't see what else would cause issues
Try like halving the resolution, see if that makes any difference
I will experiment a bit. But so far it makes no sense, cause its not like its running significantly slower up until the hang/timeout. So there is no indication why this should happen.
While staring at code, tiny observation: is a depth and stencil buffer still needed when creating the SDL window in Main.cpp (at least for the New3D renderer)? (EDIT: i now filed a PR for this)
Win11 is not friendly to AMD CPUs, which often causes the driver to lose response. The currently known Win11 optimizations that need to be done when using AMD processors include: turning off High Precision Event Timer (formerly Multimedia Timer), turning off Fullscreen Optimizations(GameDVR_FSEBehaviorMode), and logging in with an administrator account to play games. In addition, the latest kb5041587 can bring about a 10% performance improvement to AMD Zen3/4/5.
oh my.. :) Thanks, i will give that a try, too..
kb5041587 i already had installed with my latest tests..
As for the 40 (so potentially 32+8 or 64bit) depth/stencil: changing it (and the readback) to 32 again did not improve anything, i rather have the experience that this makes the machine bluescreen instead of 'just' timeout-ing. BUT i did these tests with the current master, so will have to do the same with the old revision.
Could you try glDebugMessageCallback(DebugCallback Just uncomment it in the code.
Some more updates here:
- Enabling GL debugging returned nothing suspicious whatsoever (both on current NV driver and the problematic AMD 5700U with current driver), just a harmless warning.
- Using my own builds (latest VS2019 or VS2022 versions) shows the exact same behavior, before the reversed z-buffer all fine, after reversed z-buffer hang/crash.
- Trying different SDL2 versions (including latest), also no other behavior.
..but while debugging i found a fix for another weird behavior i saw on that machine (micro stutter), so will file a PR for that one later-on.
small update: I backported a lot of the newer commits to the state of before-reverse-z, and then still all is fine. So its somehow really linked to that specific change. next step: Try to split up that commit into smaller pieces and see what exactly makes the AMD driver/HW break.
If it's not the frame buffer. Maybe it's related to the scene graph, because the culling code was rewritten. Maybe something bad is happening in the culling code that is leading to an abnormally large render load. Just speculation. I can't think what else it could be honestly. I know if the CPU clock is too low it can try and render incomplete frames.
Small update: newer AMD driver still shows the same timeout/hang (also using the dojo fork build(s)). I seem to notice some geometry glitches from time to time though now before the hang happens, so that might be some new trail (for me) to follow.
What sort of glitches? We haven't had anyone else report this issue. Normally people are pretty quick to report when stuff is broken. You could try increasing the driver timeout in windows. There is a reg key for this somewhere. Maybe set it to some crazy time if possible. Only other thing I can think of is some sort of threading bug causing the app to deadlock and maybe hang.
The glitches are only visible some ms before the hang up occurs, and look like some of the triangles (e.g. from the Daytona cars) are 'randomized', but not like all over the screen, more 'local', so rather weird. But i have yet to look into this some more.
The pre-reverse-z builds still work okay.
Try record a video if possible
Small update regarding the original issue: Updated to 08/25 AMD driver, Win11, and newest OpenGL components of Win11. Still the same lock up/hang (no more bluescreens though), although i couldn't see any graphical glitches again so far.
It's a very strange issue. Normally if something is broken array out of bounds etc it would crash the driver, not make it hang. If you can replicate it in debug mode I'd attach a debugger and see exactly which functions the threads are blocking on. If it really is some driver thing it'll probably just say what dll the problem is in but if you look up the call stack you should hopefully see some opengl functions or maybe swap buffer call.
But so far no one else has reported this issue so it's quite puzzling.