glfw
glfw copied to clipboard
Resizing Wayland windows is either laggy or causes the application to freeze on GNOME + AMD
Here's a recording:
https://github.com/glfw/glfw/assets/20155479/5a096b57-586a-46c7-bb2d-666b66f83d61
This video shows the following tests:
- Rapidly resizing a non-GLFW wayland window, in this case my terminal, to demonstrate expected behavior
- Rapidly resizing the
triangle-vulkan
test, which shows the window significantly lagging behind the mouse cursor while resizing - Rapidly resizing the
title
test, which shows the window increasingly becoming more laggy until the GNOME shell crashes
The larger the window size, the stronger the effect.
My GPU is the AMD Radeon RX 580.
I am running Arch Linux, although I also reproduced this on an OpenSUSE Tumbleweed GNOME live-boot.
I couldn't reproduce this on an OpenSUSE Tumbleweed KDE Plasma live-boot, as well as on my Fedora laptop running GNOME Wayland, which has Intel integrated graphics.
I saw something like point 3 before the recent changes but have been unable to reproduce it since. If possible, please run the title test with WAYLAND_DEBUG=1
and post the log of it below.
Done! title-wayland-debug.txt
journalctl shows this:
Feb 23 14:46:31 arch gnome-shell[63235]: WL: error in client communication (pid 65121)
Feb 23 14:46:31 arch kernel: gnome-shell[63235]: segfault at 10 ip 0000791274a80c44 sp 00007ffe902d18c8 error 4 in libwayland-server.so.0.22.0[791274a7e000+8000] likely on CPU 15 (core 3, socket 0)
Feb 23 14:46:31 arch kernel: Code: 00 00 0f 1f 40 00 f3 0f 1e fa 66 48 0f 6e c7 0f 16 47 08 0f 11 06 48 89 77 08 48 8b 46 08 48 89 30 c3 0f 1f 40 00 f3 0f 1e fa <8b> 47 10 48 8b 57 40 3d ff ff ff fe 77 2e 48 83 c2 30 48 8b 4a 10
Feb 23 14:46:31 arch systemd[1]: Started Process Core Dump (PID 65144/UID 0).
Feb 23 14:46:32 arch systemd-coredump[65145]: [🡕] Process 63235 (gnome-shell) of user 1000 dumped core.
Oh wow, the libdecor demo application also shows this laggy resize behavior. And indeed, disabling libdecor makes this problem completely disappear. https://gitlab.freedesktop.org/libdecor/libdecor/-/issues/37
I wonder though, Blender also uses libdecor and does not have this problem. Why?
Whenever glfwWindowShouldClose(window)
is called, it causes the entire DE to crash.
Example Source Code
#include <stdio.h>
#include <GLFW/glfw3.h>
int main() {
if (!glfwInit()) {
return -1;
}
glfwWindowHint(GLFW_CONTEXT_VERSION_MAJOR, 3);
glfwWindowHint(GLFW_CONTEXT_VERSION_MINOR, 2);
glfwWindowHint(GLFW_OPENGL_PROFILE, GLFW_OPENGL_COMPAT_PROFILE);
GLFWwindow* window = glfwCreateWindow(800, 600, "Hello, World!", NULL, NULL);
if (!window) {
printf("Failed to create a window\n");
glfwTerminate();
return -1;
}
glfwMakeContextCurrent(window);
while (!glfwWindowShouldClose(window)) {
glfwSwapBuffers(window);
glfwPollEvents();
}
glfwTerminate();
return 0;
}
Then build with
gcc examples/test_glfw_window.c -o examples/test_glfw_window -lglfw
Then run the binary
./examples/test_glfw_window
All I do is run the binary, then close the window, and the entire environment crashes. This happens every time I click the "x"
button on the top right of the window without fail.
Checking journalctl
reveals the following around the time of the crash.
journalctl crash output
Apr 07 21:03:51 spectra kernel: gnome-shell[18682]: segfault at 18 ip 00007162a4abd84c sp 00007ffc376b14d0 error 4 in libmutter-14.so.0.0.0[7162a4a3b000+19c000] likely on CPU 14 (core 6, socket 0)
Apr 07 21:03:51 spectra kernel: Code: 13 00 ff 15 66 13 1c 00 e9 93 dd ff ff 49 8b 44 24 28 48 89 85 f0 fe ff ff 48 85 c0 0f 84 e1 f9 ff ff 48 89 c7 e8 e4 2c 0b 00 <48> 8b 78 18 49 89 c4 e8 d8 ab 10 00 48 8b b5 38 ff ff ff 48 89 c7
Apr 07 21:03:51 spectra systemd[1]: Started Process Core Dump (PID 42452/UID 0).
The remaining stack trace depends on what I'm doing, even though it's rarely ever related or directly caused by it. The end of the stack trace depends on how it began.
journalctl end of stack trace
Stack trace of thread 19637:
#0 0x00007162a49190bf __poll (libc.so.6 + 0xfb0bf)
#1 0x000071620dba49b7 n/a (libpulse.so.0 + 0x339b7)
#2 0x000071620db8e45c pa_mainloop_poll (libpulse.so.0 + 0x1d45c)
#3 0x000071620db9861c pa_mainloop_iterate (libpulse.so.0 + 0x2761c)
#4 0x000071620db986d1 pa_mainloop_run (libpulse.so.0 + 0x276d1)
#5 0x000071620dba8bf2 n/a (libpulse.so.0 + 0x37bf2)
#6 0x000071620db462b7 n/a (libpulsecommon-17.0.so + 0x5c2b7)
#7 0x00007162a48a955a n/a (libc.so.6 + 0x8b55a)
#8 0x00007162a4926a3c n/a (libc.so.6 + 0x108a3c)
ELF object binary architecture: AMD x86-64
The last line is always the same.
ELF object binary architecture: AMD x86-64
Resizing the window causes artifacts and occasionally affects performance. I also get libdecor
related complaints.
libdecor warning
21:56:58 | ~/Local/learn-opengl
git:(main | Δ) λ ./examples/test_glfw_window
libdecor-gtk-WARNING: Failed to initialize GTK
Failed to load plugin 'libdecor-gtk.so': failed to init
^C
The only time I can exit the application safely is when I use ^C
to signal an interrupt.
I think this issue may be related to GPU's driver. Not sure how this works at all because this is completely out of my experience.
Hardware Specs
21:50:01 | ~
λ neofetch --color_blocks off --backend off
austin@spectra
--------------
OS: Arch Linux x86_64
Host: B650M AORUS ELITE AX
Kernel: 6.6.25-1-lts
Uptime: 3 hours, 49 mins
Packages: 1566 (pacman)
Shell: zsh 5.9
Resolution: 1920x1080
DE: GNOME 46.0
WM: Mutter
WM Theme: Adwaita
Theme: Adwaita [GTK2/3]
Icons: Adwaita [GTK2/3]
Terminal: gnome-terminal
CPU: AMD Ryzen 7 7700X (16) @ 5.573GHz
GPU: AMD ATI 10:00.0 Raphael
GPU: AMD ATI Radeon RX 470/480/570/570X/580/580X/590
Memory: 57877MiB / 127957MiB
Let me know if any other information might help. I'm also willing to attempt other steps to attempt to isolate and diagnose the issue.
glxinfo -B output
22:15:19 | ~
λ glxinfo -B
name of display: :0
display: :0 screen: 0
direct rendering: Yes
Extended renderer info (GLX_MESA_query_renderer):
Vendor: AMD (0x1002)
Device: AMD Radeon RX 580 Series (radeonsi, polaris10, LLVM 17.0.6, DRM 3.54, 6.6.25-1-lts) (0x67df)
Version: 24.0.4
Accelerated: yes
Video memory: 8192MB
Unified memory: no
Preferred profile: core (0x1)
Max core profile version: 4.6
Max compat profile version: 4.6
Max GLES1 profile version: 1.1
Max GLES[23] profile version: 3.2
Memory info (GL_ATI_meminfo):
VBO free memory - total: 7499 MB, largest block: 7499 MB
VBO free aux. memory - total: 63875 MB, largest block: 63875 MB
Texture free memory - total: 7499 MB, largest block: 7499 MB
Texture free aux. memory - total: 63875 MB, largest block: 63875 MB
Renderbuffer free memory - total: 7499 MB, largest block: 7499 MB
Renderbuffer free aux. memory - total: 63875 MB, largest block: 63875 MB
Memory info (GL_NVX_gpu_memory_info):
Dedicated video memory: 8192 MB
Total available memory: 72170 MB
Currently available dedicated video memory: 7499 MB
OpenGL vendor string: AMD
OpenGL renderer string: AMD Radeon RX 580 Series (radeonsi, polaris10, LLVM 17.0.6, DRM 3.54, 6.6.25-1-lts)
OpenGL core profile version string: 4.6 (Core Profile) Mesa 24.0.4-arch1.2
OpenGL core profile shading language version string: 4.60
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile
OpenGL version string: 4.6 (Compatibility Profile) Mesa 24.0.4-arch1.2
OpenGL shading language version string: 4.60
OpenGL context flags: (none)
OpenGL profile mask: compatibility profile
OpenGL ES profile version string: OpenGL ES 3.2 Mesa 24.0.4-arch1.2
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.20
@Friz64 @teleprint-me Can you test your issues with mutter-git
from the AUR? This will be equivalent to upgrading to Mutter 46.1 which comes out at the end of this week. I think I ran into similar issues with GLFW and updating mutter
fixed it for me.
It still behaves the same way, but instead of my desktop crashing, the GLFW window now just disappears and the application freezes.
@Geo25rey
Sorry for the delay. I've been busy.
λ pacman -Ss mutter
# ...omitting packages for brevity
extra/mutter 46.1-1 [installed]
Window manager and compositor for GNOME
I will test when I have some time.
After upgrading to GNOME and mutter 46.1, it still seems to be broken for me
I am experiencing the same issue now. Arch linux, gnome(wayland) 46.1-2, glfw 3.4-2. Dual graphics laptop. Intel and Nvidia. When running on Intel card it just shutters and loads 100% CPU (+50% from gnome-shell) But when I run it on Nvidia (prime-run or __NV_PRIME_RENDER_OFFLOAD=1 __GLX_VENDOR_LIBRARY_NAME=nvidia) and try to resize the window it freezes and performs resize after a valuable time.
I use two screen setup. With only one screen things go much better, but error is still present.
Test program:
#include <GLFW/glfw3.h>
int main()
{
glfwInit();
GLFWwindow *window = glfwCreateWindow(800, 600, "OpenGL", NULL, NULL);
glfwMakeContextCurrent(window);
while(!glfwWindowShouldClose(window))
{
glfwPollEvents();
glfwSwapBuffers(window);
}
glfwTerminate();
return 0;
}
Video reproduction (2 screens): Screencast from 2024-05-23 21-28-42.webm
And with 1 screen: Screencast from 2024-05-23 21-33-45.webm
With 1 screen and Intel graphics everything is smooth, but CPU load is 55% (+55% from gnome-shell)
My screen is 144 herz, glfwSwapInterval(0) does not fix the problem.
If I call glfwSwapBuffers(window) inside framebufferSizeCallback the window resizes smoother but I can't control it. Screencast from 2024-05-23 22-09-10.webm
I have this problem too (on Intel+Nvidia laptop). When resize window that rendered on discrete gpu it very laggy. On intel all works good
Same on intel UHD, and libdecor has nothing to do with it, I am able to reproduce it on triangle-vulkan with libdecor disabled. I suspect GLFW maybe be using wayland protocols incorrectly or suboptimally somewhere.
On a different note, when I look at the perf dump while doing a bunch of resizing, it shows that most time is spent inside of the i915 kernel driver, doing some kind of memory management while trying to submit a vulkan command buffer from inside the GLFW refresh event. Note that same kind of memory heavy management does not occur when drawing normally. Hence, this might also be a kernel driver bug. Or maybe yet another consequence of lack of explicit sync on linux?
Would love to see how perf flamegraphs look or other GPU vendors though.
Finally, thinking about it some more, this might be an entire set of completely different bugs. Libdecor is very much slow, and when I was looking at flamegraphs with it enabled, it was eating up half the redraw time. With it disabled, the i915 driver eats up most of the time. Maybe every single report here is basically caused by the wayland-characteristic client side redraw being slow, but the reason for it being slow is different for every one of the reports?
P.S. I also observed triangle-vulkan crash once, so GLFW might be a little bit at fault here after all.