osu-framework
osu-framework copied to clipboard
Introduce "reduce dropped frames" support
Main information
tl;dr: the text of this issue is just describing the effects of glFinish, and how calling or not calling it after SwapBuffers affects certain drivers.
Considering the UHD 620/630 driver bug, I suggest that instead of just a binary "call glFinish" / "no need to call glFinish", also have a third option named "force rendering", which should do platform-specific window redrawing, like calling InvalidateRect and glFinish on Windows before SwapBuffers. Although the problem is that this would require platform-specific calls, which seem to have only very minimum marginal improvements on some drivers on top of calling glFinish, so considering the amount of work it needs, it might not be desired just for a "very small percentage" of userbase. Just calling glFinish is usually enough for most broken drivers.
Details
After gathering a large amount of intel (no pun intended), it seems like it's finally time to implement "reduce dropped frames" support into the framework.
Looking at GameHost.Swap(), the current way of this being implemented is being enforced with VSync (except as a workaround for macOS, which should be no longer needed once this is implemented). Using glFinish on VSync is unnecessary, as it's the driver's job to block on SwapBuffers, and handle VSync-relevant stuff accordingly.
However, drivers are weird, and they come with tiny little bugs which add up into everyone's pain. Most notorious are Intel UHD 620/630 GPUs, as they have peculiar driver issues, which make them require very simple, yet necessary workarounds. Although it seems to be not an Intel-exclusive problem, they are harder hit by this issue than other manufacturers.
There are probably still a few issues which I can't find via search, but this is the most prominent one I could find: [ppy/osu#7447]. This is a known problem with UHD 620/630, although those affected in the issue are hit much harder with this bug than it normally happens (like on Windows for example). A notable mention I found is [ppy/osu#9851], which looks similar to the UHD 620/630 bug, except it's probably simply caused by a driver overload, and not a buggy driver. Considering that VSync got rid of the stutters, it is very most likely that the driver is overloaded so much that it freezes the entire system, which the UHD 620/630 bug also does quite regularly.
The performance issue on macOS seems to be caused by the driver trying to be smart, but completely failing at it, resulting in some weird rubber-banding effect, where the driver is either way too overloaded, or idling, depending on frame timing. The problem seems to be way too complex to analyze, but the basic gist seems to be that there is some logic oversight in the frame scheduling pipeline to prevent screen tearing when double-buffered, but it fails spectacularly.
Without glFinish:
With glFinish:

These images were sampled using similar hardware, and same macOS version as in [ppy/osu#7447]. Without glFinish, it's definitely not displaying 60 frames, and it appears really really laggy, despite the FPS counter saying otherwise. On the other hand, with glFinish the image is really smooth, and also slightly improves latency as well (if glFinish is put after SwapBuffers).
On Windows, this problem is much more severe than just inconsistent frame times. On top of inconsistent frametimes, there are not-so-rare freezes, and sometimes even driver hangs/crashes.
This video showcases how stutters work, and I managed to record the worst-case scenairo, as depending on the frametime, the driver may stutter a bit, or may stutter an extreme lot (like here, observably most likely caused by 360FPS rendered frames being integer multiple of 60FPS display refresh). This video showcases how hangs work. Not visible on the video, but the entire display updates in sync with the game hanging. So if the game is hanged for more than a few seconds, and I move the cursor over the desktop, or anything, the image will not update until the game frame itself updates.
While on Windows the frametime graph stays the same both without and with glFinish, its effects are extremely noticable. Without glFinish, the image is constantly stuttering, not fluid, and freezes somewhat regularly for some undeterminate amount of time, and driver crashes are often, ranging from every 5mins to 1.5-2days. With glFinish however, every problem disappears. Stutters are only caused by natural causes, like actually overloading the GPU by rendering too much. Hangs no longer happen. I left my laptop aging for really long time, and I never got a driver hang/crash with the bugfix applied.
Edit: to be clear, stutters still happen without InvalidateRect, but just glFinish alone already gets rid of it almost completely.
Why glFinish is called after SwapBuffers, and not before
Not related to any driver bugs, but an interesting tidbit I noticed both on Windows and macOS, is that the order of glFinish and SwapBuffer matters. Not by much, but it makes a difference (230FPS vs 293FPS). The simple reason is that calling glFinish before SwapBuffers synchronizes the GPU twice, whereas calling glFinish after SwapBuffers synchronizes the GPU only once, meaning less time lost due to synchronization. This simple information should be put above the code calling glFinish after SwapBuffers, so a smartass like me doesn't try to correct it before realizing why it's ordered like that against any logical sense.
If the requirement can be inferred from device name then it should be done that way. Adding such a setting back should be the last option if we have no other path, and it should be called WorkaroundForXXXDevice, not "reduce dropped frames".
I like the idea of having an enum where OS- and device-specific fixes are applied to certain GL vendor strings.
Although I'd still give the user control to it, at least via framework.ini, so the user can test out if certain fixes work better than our best ability to test a fix for a certain device, so we can adjust the code accordingly.
Here is my idea: besides the workarounds, have some generic options, like Default, Auto, Force, and None.
Default is the current one, and does glFinish after SwapBuffers, but only in VSync. Auto will be the new default option, and a workaround will be automatically selected, if needed, otherwise falls back to Default. Force will always call glFinish after SwapBuffers, while None will not.
I think this could be named FrameSyncMode in framework.ini?
If my proposal sounds right then I would implement this, and make a PR.