SDL icon indicating copy to clipboard operation
SDL copied to clipboard

Stale wayland viewport leads to graphical corruption on fractional DPI scaling systems

Open cgutman opened this issue 2 years ago • 11 comments

https://github.com/libsdl-org/SDL/commit/4d1905c9b60489b58c486cee350356e156ce6856 has introduced some graphical corruption for me when GLES applications are rendering using a separate render thread and I press Super+Left/Right/Up to move the window around on GNOME with fractional DPI scaling is enabled.

To my eye, it seems as if the wayland viewport we end up committing corresponds to the previous size of the window rather than the new size. I'm not sure whether this behavior is really our fault or if it's an external bug that we just exposed, but we should find a workaround or mitigation.

To reproduce the bug:

  1. Apply my patch below to testgles2
  2. Run SDL_VIDEODRIVER=wayland testgles2 --vsync
  3. Move the window around using Super+Left/Right/Up/Down and note how the viewport ends up screwed up (often rendering too large, too small, or at a strange scale).
    • If necessary, compare with SDL_VIDEODRIVER=x11 to see how the viewport is expected to behave.

cc: @Kontrabant

WAYLAND_DEBUG=1 log

System: Intel Core i7 1280P Fedora 36 GNOME 42.3 125% DPI scaling

Here is a patch to testgles2 that can reproduce the bug:

diff --git a/test/testgles2.c b/test/testgles2.c
index 4b9e5d250..7724aed97 100644
--- a/test/testgles2.c
+++ b/test/testgles2.c
@@ -442,56 +442,41 @@ int done;
 Uint32 frames;
 shader_data *datas;
 
+static int RenderThread(void *ptr)
+{
+    int i, w, h, status;
+
+    while (!done) {
+        for (i = 0; i < state->num_windows; ++i) {
+            status = SDL_GL_MakeCurrent(state->windows[i], context[i]);
+            if (status) {
+                SDL_Log("SDL_GL_MakeCurrent(): %s\n", SDL_GetError());
+
+                /* Continue for next window */
+                continue;
+            }
+
+            SDL_GL_GetDrawableSize(state->windows[i], &w, &h);
+            ctx.glViewport(0, 0, w, h);
+
+            Render(w, h, &datas[i]);
+
+            SDL_GL_SwapWindow(state->windows[i]);
+            ++frames;
+        }
+    }
+
+    return 0;
+}
+
 void loop()
 {
     SDL_Event event;
-    int i;
-    int status;
 
     /* Check for events */
-    ++frames;
     while (SDL_PollEvent(&event) && !done) {
-        switch (event.type) {
-        case SDL_WINDOWEVENT:
-            switch (event.window.event) {
-                case SDL_WINDOWEVENT_SIZE_CHANGED:
-                    for (i = 0; i < state->num_windows; ++i) {
-                        if (event.window.windowID == SDL_GetWindowID(state->windows[i])) {
-                            int w, h;
-                            status = SDL_GL_MakeCurrent(state->windows[i], context[i]);
-                            if (status) {
-                                SDL_Log("SDL_GL_MakeCurrent(): %s\n", SDL_GetError());
-                                break;
-                            }
-                            /* Change view port to the new window dimensions */
-                            SDL_GL_GetDrawableSize(state->windows[i], &w, &h);
-                            ctx.glViewport(0, 0, w, h);
-                            state->window_w = event.window.data1;
-                            state->window_h = event.window.data2;
-                            /* Update window content */
-                            Render(event.window.data1, event.window.data2, &datas[i]);
-                            SDL_GL_SwapWindow(state->windows[i]);
-                            break;
-                        }
-                    }
-                    break;
-            }
-        }
         SDLTest_CommonEvent(state, &event, &done);
     }
-    if (!done) {
-      for (i = 0; i < state->num_windows; ++i) {
-          status = SDL_GL_MakeCurrent(state->windows[i], context[i]);
-          if (status) {
-              SDL_Log("SDL_GL_MakeCurrent(): %s\n", SDL_GetError());
-
-              /* Continue for next window */
-              continue;
-          }
-          Render(state->window_w, state->window_h, &datas[i]);
-          SDL_GL_SwapWindow(state->windows[i]);
-      }
-    }
 #ifdef __EMSCRIPTEN__
     else {
         emscripten_cancel_main_loop();
@@ -551,7 +536,7 @@ main(int argc, char *argv[])
     }
 
     /* Set OpenGL parameters */
-    state->window_flags |= SDL_WINDOW_OPENGL | SDL_WINDOW_RESIZABLE | SDL_WINDOW_BORDERLESS;
+    state->window_flags |= SDL_WINDOW_OPENGL | SDL_WINDOW_RESIZABLE | SDL_WINDOW_ALLOW_HIGHDPI;
     state->gl_red_size = 5;
     state->gl_green_size = 5;
     state->gl_blue_size = 5;
@@ -724,6 +709,9 @@ main(int argc, char *argv[])
 
         GL_CHECK(ctx.glEnable(GL_CULL_FACE));
         GL_CHECK(ctx.glEnable(GL_DEPTH_TEST));
+
+        /* Detach the GL context so the render thread can attach it */
+        SDL_GL_MakeCurrent(state->windows[i], NULL);
     }
 
     /* Main render loop */
@@ -734,9 +722,14 @@ main(int argc, char *argv[])
 #ifdef __EMSCRIPTEN__
     emscripten_set_main_loop(loop, 0, 1);
 #else
-    while (!done) {
-        loop();
+    {
+        SDL_Thread *thread = SDL_CreateThread(RenderThread, "RenderThread", NULL);
+        while (!done) {
+            loop();
+        }
+        SDL_WaitThread(thread, NULL);
     }
+
 #endif
 
     /* Print out some timing information */

cgutman avatar Aug 14 '22 18:08 cgutman

I can't even test this, as even with scaling off, when I try to move the window with super+arrow, it just hangs while two CPU cores are pegged at 100%, then eventually segfaults when I try to quit.

Edit: To note, I'm using an Nvidia card with the 515.65.01 drivers.

Kontrabant avatar Aug 14 '22 19:08 Kontrabant

then eventually segfaults when I try to quit.

There is a UAF in the PoC due to the fact that the SDL test common code calls SDL_DestroyWindow() on SDL_WINDOWEVENT_CLOSE (before we join the render thread). It's just a bug in the PoC and not related to the issue here though.

Edit: To note, I'm using an Nvidia card with the 515.65.01 drivers.

Ah yeah, Nvidia has their own issues with rendering on a separate thread that @sulix was working on IIRC.

I can still repro with llvmpipe by using LIBGL_ALWAYS_SOFTWARE=true, so you might be able to test it that way on your Nvidia system.

cgutman avatar Aug 14 '22 19:08 cgutman

Yeah: nVidia's wayland implementation definitely has some issues, particularly with resizing. There's no complete fix yet — it's going to require an new EGL extension to fix properly.

In the meantime, it's worth noting:

  • This has changed a few times so the nVidia behaviour is going to be very different depending on what libnvidia-egl-wayland version you're running.
  • Versions < 1.10 will hang on any resize if the resize is done from a different thread.
  • Version 1.10 will cause some small graphical corruption on resize, as the surface is replaced with an uninitialised version.
  • Most distros reverted the change, so have a "1.10" which still hangs on resize.
  • There's a fix in the latest git version which defers the resize (if it occurs on another thread) until immediately after swapbuffers, meaning the old size surface is committed for one frame. (This can cause problems with maximize / fullscreen transition / Super+Arrows, at least on KDE, and is "wrong", but is better than the previous version.)

Most of the important info about it is in https://github.com/NVIDIA/egl-wayland/pull/53, https://github.com/NVIDIA/egl-wayland/issues/57 and https://github.com/NVIDIA/egl-wayland/pull/60

sulix avatar Aug 15 '22 03:08 sulix

LIBGL_ALWAYS_SOFTWARE=true

This didn't work either, so I dug out an old system with an AMD card and manged to reproduce this.

Honestly, I'm stumped. Everything in the debug logs looks correct, tracing through it looks fine, I even tried the brute force method of setting the viewport source and destination on the render thread before every flip and the results were always broken. I'm guessing the cause of this lies somewhere below SDL.

I tried it on KDE on the same system as well and it behaves correctly there.

Kontrabant avatar Aug 15 '22 16:08 Kontrabant

I think this is another instance of the perennial issue of committing incorrectly sized buffers. Moving the call to ConfigureWindowGeometry() onto the render thread after eglSwapBuffers() "fixes" it.

My guess is that eglSwapBuffers() is committing a buffer of the old size after we've already acknowledged the new one and that's what's causing Mutter to compute the viewport incorrectly. That would explain how we end up with a viewport that appears to match the old size, rather than the new one.

This is the hack I tried that works for me:

diff --git a/src/video/wayland/SDL_waylandopengles.c b/src/video/wayland/SDL_waylandopengles.c
index 26626ee51..b34e82fd2 100644
--- a/src/video/wayland/SDL_waylandopengles.c
+++ b/src/video/wayland/SDL_waylandopengles.c
@@ -107,6 +107,9 @@ Wayland_GLES_GetSwapInterval(_THIS)
     return _this->egl_data->egl_swapinterval;
 }
 
+void
+ConfigureWindowGeometry(SDL_Window *window);
+
 int
 Wayland_GLES_SwapWindow(_THIS, SDL_Window *window)
 {
@@ -173,6 +176,8 @@ Wayland_GLES_SwapWindow(_THIS, SDL_Window *window)
         return SDL_EGL_SetError("unable to show color buffer in an OS-native window", "eglSwapBuffers");
     }
 
+    ConfigureWindowGeometry(window);
+
     WAYLAND_wl_display_flush( data->waylandData->display );
 
     return 0;
diff --git a/src/video/wayland/SDL_waylandwindow.c b/src/video/wayland/SDL_waylandwindow.c
index c63bb1b94..15028e014 100644
--- a/src/video/wayland/SDL_waylandwindow.c
+++ b/src/video/wayland/SDL_waylandwindow.c
@@ -235,7 +235,7 @@ UnsetDrawSurfaceViewport(SDL_Window *window)
     }
 }
 
-static void
+void
 ConfigureWindowGeometry(SDL_Window *window)
 {
     SDL_WindowData        *data    = window->driverdata;
@@ -1993,7 +1993,7 @@ Wayland_HandleResize(SDL_Window *window, int width, int height, float scale)
     window->w = width;
     window->h = height;
     data->scale_factor = scale;
-    ConfigureWindowGeometry(window);
+    //ConfigureWindowGeometry(window);
 
     if (data->needs_resize_event || old_w != width || old_h != height || !FloatEqual(data->scale_factor, old_scale)) {
         /* We may have already updated window w/h (or only adjusted scale factor),

Obviously we can't ship that, but maybe it will inspire ideas for a proper fix/workaround.

cgutman avatar Aug 16 '22 03:08 cgutman

Sorry for the commit spam. I forgot GitHub did that on forks.

Anyway, I cleaned up the PoC and added a proper --threaded option to testgles2 which can reproduce this issue using: SDL_VIDEODRIVER=wayland ./testgles2 --allow-highdpi --vsync --threaded

If we do need to go to GNOME for a fix, this should at least give them an easy way to reproduce the issue.

cgutman avatar Aug 17 '22 05:08 cgutman

We'd like to ship 2.24.0, is this a showstopper? I can't think of any fix that wouldn't be risky at this point and I don't think we can cleanly back out the offending change.

slouken avatar Aug 17 '22 21:08 slouken

No, I don't think it is.

cgutman avatar Aug 17 '22 22:08 cgutman

This is probably something that the GNOME devs have to look into, as there doesn't seem to be a clean fix and other window managers don't exhibit the issue. Given that it didn't manifest until now with a specific test case, there doesn't seem to be many (any?) real world applications affected.

Kontrabant avatar Aug 17 '22 22:08 Kontrabant

Okay, bumping to 2.26.0, thanks!

slouken avatar Aug 17 '22 22:08 slouken

Fedora pushed a new egl-wayland update and now the test not only doesn't crash on resize anymore, but the viewport issue doesn't manifest when using the Nvidia drivers.

edit: Ah, playing with it a bit more, it seems that sometimes it works correctly and sometimes the viewport is wrong, There doesn't seem to be any consistent pattern to when it works and when it doesn't though.

Kontrabant avatar Aug 19 '22 15:08 Kontrabant

Has this been filed on GNOME's issue tracker by chance? Confirmation that this is an external issue will help move it along for #6362.

flibitijibibo avatar Oct 21 '22 01:10 flibitijibibo

No I haven't filed one with GNOME yet. I'll do it this weekend if someone doesn't beat me to it.

cgutman avatar Oct 31 '22 04:10 cgutman

Filed https://gitlab.gnome.org/GNOME/mutter/-/issues/2503

cgutman avatar Nov 11 '22 20:11 cgutman

mutter has fixed it in mainline, so that should cover that - not sure if it will be backported but it should definitely be in the next major release.

flibitijibibo avatar Nov 13 '22 18:11 flibitijibibo