SDL
SDL copied to clipboard
Stale wayland viewport leads to graphical corruption on fractional DPI scaling systems
https://github.com/libsdl-org/SDL/commit/4d1905c9b60489b58c486cee350356e156ce6856 has introduced some graphical corruption for me when GLES applications are rendering using a separate render thread and I press Super+Left/Right/Up to move the window around on GNOME with fractional DPI scaling is enabled.
To my eye, it seems as if the wayland viewport we end up committing corresponds to the previous size of the window rather than the new size. I'm not sure whether this behavior is really our fault or if it's an external bug that we just exposed, but we should find a workaround or mitigation.
To reproduce the bug:
- Apply my patch below to testgles2
- Run
SDL_VIDEODRIVER=wayland testgles2 --vsync
- Move the window around using Super+Left/Right/Up/Down and note how the viewport ends up screwed up (often rendering too large, too small, or at a strange scale).
- If necessary, compare with
SDL_VIDEODRIVER=x11
to see how the viewport is expected to behave.
- If necessary, compare with
cc: @Kontrabant
System: Intel Core i7 1280P Fedora 36 GNOME 42.3 125% DPI scaling
Here is a patch to testgles2
that can reproduce the bug:
diff --git a/test/testgles2.c b/test/testgles2.c
index 4b9e5d250..7724aed97 100644
--- a/test/testgles2.c
+++ b/test/testgles2.c
@@ -442,56 +442,41 @@ int done;
Uint32 frames;
shader_data *datas;
+static int RenderThread(void *ptr)
+{
+ int i, w, h, status;
+
+ while (!done) {
+ for (i = 0; i < state->num_windows; ++i) {
+ status = SDL_GL_MakeCurrent(state->windows[i], context[i]);
+ if (status) {
+ SDL_Log("SDL_GL_MakeCurrent(): %s\n", SDL_GetError());
+
+ /* Continue for next window */
+ continue;
+ }
+
+ SDL_GL_GetDrawableSize(state->windows[i], &w, &h);
+ ctx.glViewport(0, 0, w, h);
+
+ Render(w, h, &datas[i]);
+
+ SDL_GL_SwapWindow(state->windows[i]);
+ ++frames;
+ }
+ }
+
+ return 0;
+}
+
void loop()
{
SDL_Event event;
- int i;
- int status;
/* Check for events */
- ++frames;
while (SDL_PollEvent(&event) && !done) {
- switch (event.type) {
- case SDL_WINDOWEVENT:
- switch (event.window.event) {
- case SDL_WINDOWEVENT_SIZE_CHANGED:
- for (i = 0; i < state->num_windows; ++i) {
- if (event.window.windowID == SDL_GetWindowID(state->windows[i])) {
- int w, h;
- status = SDL_GL_MakeCurrent(state->windows[i], context[i]);
- if (status) {
- SDL_Log("SDL_GL_MakeCurrent(): %s\n", SDL_GetError());
- break;
- }
- /* Change view port to the new window dimensions */
- SDL_GL_GetDrawableSize(state->windows[i], &w, &h);
- ctx.glViewport(0, 0, w, h);
- state->window_w = event.window.data1;
- state->window_h = event.window.data2;
- /* Update window content */
- Render(event.window.data1, event.window.data2, &datas[i]);
- SDL_GL_SwapWindow(state->windows[i]);
- break;
- }
- }
- break;
- }
- }
SDLTest_CommonEvent(state, &event, &done);
}
- if (!done) {
- for (i = 0; i < state->num_windows; ++i) {
- status = SDL_GL_MakeCurrent(state->windows[i], context[i]);
- if (status) {
- SDL_Log("SDL_GL_MakeCurrent(): %s\n", SDL_GetError());
-
- /* Continue for next window */
- continue;
- }
- Render(state->window_w, state->window_h, &datas[i]);
- SDL_GL_SwapWindow(state->windows[i]);
- }
- }
#ifdef __EMSCRIPTEN__
else {
emscripten_cancel_main_loop();
@@ -551,7 +536,7 @@ main(int argc, char *argv[])
}
/* Set OpenGL parameters */
- state->window_flags |= SDL_WINDOW_OPENGL | SDL_WINDOW_RESIZABLE | SDL_WINDOW_BORDERLESS;
+ state->window_flags |= SDL_WINDOW_OPENGL | SDL_WINDOW_RESIZABLE | SDL_WINDOW_ALLOW_HIGHDPI;
state->gl_red_size = 5;
state->gl_green_size = 5;
state->gl_blue_size = 5;
@@ -724,6 +709,9 @@ main(int argc, char *argv[])
GL_CHECK(ctx.glEnable(GL_CULL_FACE));
GL_CHECK(ctx.glEnable(GL_DEPTH_TEST));
+
+ /* Detach the GL context so the render thread can attach it */
+ SDL_GL_MakeCurrent(state->windows[i], NULL);
}
/* Main render loop */
@@ -734,9 +722,14 @@ main(int argc, char *argv[])
#ifdef __EMSCRIPTEN__
emscripten_set_main_loop(loop, 0, 1);
#else
- while (!done) {
- loop();
+ {
+ SDL_Thread *thread = SDL_CreateThread(RenderThread, "RenderThread", NULL);
+ while (!done) {
+ loop();
+ }
+ SDL_WaitThread(thread, NULL);
}
+
#endif
/* Print out some timing information */
I can't even test this, as even with scaling off, when I try to move the window with super+arrow, it just hangs while two CPU cores are pegged at 100%, then eventually segfaults when I try to quit.
Edit: To note, I'm using an Nvidia card with the 515.65.01 drivers.
then eventually segfaults when I try to quit.
There is a UAF in the PoC due to the fact that the SDL test common code calls SDL_DestroyWindow()
on SDL_WINDOWEVENT_CLOSE
(before we join the render thread). It's just a bug in the PoC and not related to the issue here though.
Edit: To note, I'm using an Nvidia card with the 515.65.01 drivers.
Ah yeah, Nvidia has their own issues with rendering on a separate thread that @sulix was working on IIRC.
I can still repro with llvmpipe by using LIBGL_ALWAYS_SOFTWARE=true
, so you might be able to test it that way on your Nvidia system.
Yeah: nVidia's wayland implementation definitely has some issues, particularly with resizing. There's no complete fix yet — it's going to require an new EGL extension to fix properly.
In the meantime, it's worth noting:
- This has changed a few times so the nVidia behaviour is going to be very different depending on what libnvidia-egl-wayland version you're running.
- Versions < 1.10 will hang on any resize if the resize is done from a different thread.
- Version 1.10 will cause some small graphical corruption on resize, as the surface is replaced with an uninitialised version.
- Most distros reverted the change, so have a "1.10" which still hangs on resize.
- There's a fix in the latest git version which defers the resize (if it occurs on another thread) until immediately after swapbuffers, meaning the old size surface is committed for one frame. (This can cause problems with maximize / fullscreen transition / Super+Arrows, at least on KDE, and is "wrong", but is better than the previous version.)
Most of the important info about it is in https://github.com/NVIDIA/egl-wayland/pull/53, https://github.com/NVIDIA/egl-wayland/issues/57 and https://github.com/NVIDIA/egl-wayland/pull/60
LIBGL_ALWAYS_SOFTWARE=true
This didn't work either, so I dug out an old system with an AMD card and manged to reproduce this.
Honestly, I'm stumped. Everything in the debug logs looks correct, tracing through it looks fine, I even tried the brute force method of setting the viewport source and destination on the render thread before every flip and the results were always broken. I'm guessing the cause of this lies somewhere below SDL.
I tried it on KDE on the same system as well and it behaves correctly there.
I think this is another instance of the perennial issue of committing incorrectly sized buffers. Moving the call to ConfigureWindowGeometry()
onto the render thread after eglSwapBuffers()
"fixes" it.
My guess is that eglSwapBuffers()
is committing a buffer of the old size after we've already acknowledged the new one and that's what's causing Mutter to compute the viewport incorrectly. That would explain how we end up with a viewport that appears to match the old size, rather than the new one.
This is the hack I tried that works for me:
diff --git a/src/video/wayland/SDL_waylandopengles.c b/src/video/wayland/SDL_waylandopengles.c
index 26626ee51..b34e82fd2 100644
--- a/src/video/wayland/SDL_waylandopengles.c
+++ b/src/video/wayland/SDL_waylandopengles.c
@@ -107,6 +107,9 @@ Wayland_GLES_GetSwapInterval(_THIS)
return _this->egl_data->egl_swapinterval;
}
+void
+ConfigureWindowGeometry(SDL_Window *window);
+
int
Wayland_GLES_SwapWindow(_THIS, SDL_Window *window)
{
@@ -173,6 +176,8 @@ Wayland_GLES_SwapWindow(_THIS, SDL_Window *window)
return SDL_EGL_SetError("unable to show color buffer in an OS-native window", "eglSwapBuffers");
}
+ ConfigureWindowGeometry(window);
+
WAYLAND_wl_display_flush( data->waylandData->display );
return 0;
diff --git a/src/video/wayland/SDL_waylandwindow.c b/src/video/wayland/SDL_waylandwindow.c
index c63bb1b94..15028e014 100644
--- a/src/video/wayland/SDL_waylandwindow.c
+++ b/src/video/wayland/SDL_waylandwindow.c
@@ -235,7 +235,7 @@ UnsetDrawSurfaceViewport(SDL_Window *window)
}
}
-static void
+void
ConfigureWindowGeometry(SDL_Window *window)
{
SDL_WindowData *data = window->driverdata;
@@ -1993,7 +1993,7 @@ Wayland_HandleResize(SDL_Window *window, int width, int height, float scale)
window->w = width;
window->h = height;
data->scale_factor = scale;
- ConfigureWindowGeometry(window);
+ //ConfigureWindowGeometry(window);
if (data->needs_resize_event || old_w != width || old_h != height || !FloatEqual(data->scale_factor, old_scale)) {
/* We may have already updated window w/h (or only adjusted scale factor),
Obviously we can't ship that, but maybe it will inspire ideas for a proper fix/workaround.
Sorry for the commit spam. I forgot GitHub did that on forks.
Anyway, I cleaned up the PoC and added a proper --threaded
option to testgles2
which can reproduce this issue using: SDL_VIDEODRIVER=wayland ./testgles2 --allow-highdpi --vsync --threaded
If we do need to go to GNOME for a fix, this should at least give them an easy way to reproduce the issue.
We'd like to ship 2.24.0, is this a showstopper? I can't think of any fix that wouldn't be risky at this point and I don't think we can cleanly back out the offending change.
No, I don't think it is.
This is probably something that the GNOME devs have to look into, as there doesn't seem to be a clean fix and other window managers don't exhibit the issue. Given that it didn't manifest until now with a specific test case, there doesn't seem to be many (any?) real world applications affected.
Okay, bumping to 2.26.0, thanks!
Fedora pushed a new egl-wayland update and now the test not only doesn't crash on resize anymore, but the viewport issue doesn't manifest when using the Nvidia drivers.
edit: Ah, playing with it a bit more, it seems that sometimes it works correctly and sometimes the viewport is wrong, There doesn't seem to be any consistent pattern to when it works and when it doesn't though.
Has this been filed on GNOME's issue tracker by chance? Confirmation that this is an external issue will help move it along for #6362.
No I haven't filed one with GNOME yet. I'll do it this weekend if someone doesn't beat me to it.
Filed https://gitlab.gnome.org/GNOME/mutter/-/issues/2503
mutter has fixed it in mainline, so that should cover that - not sure if it will be backported but it should definitely be in the next major release.