open-gpu-kernel-modules icon indicating copy to clipboard operation
open-gpu-kernel-modules copied to clipboard

Vulkan applications gets stuck after resizing

Open Kimiblock opened this issue 11 months ago • 9 comments

NVIDIA Open GPU Kernel Modules Version

565.77

Please confirm this issue does not happen with the proprietary driver (of the same version). This issue tracker is only for bugs specific to the open kernel driver.

  • [ ] I confirm that this does not happen with the proprietary driver package.

Operating System and Version

Arch Linux

Kernel Release

6.12.9-arch1-1

Please confirm you are running a stable release kernel (e.g. not a -rc). We do not accept bug reports for unreleased kernels.

  • [x] I am running on a stable kernel release.

Hardware: GPU

NVIDIA GeForce RTX 4060 Laptop GPU

Describe the bug

Applications on internal screen (2560x1600@240Hz, VRR) get stuck after doing a resize. This does not seem to happen on the external screen(4K@60Hz).

GTK for instance complains about Gsk-WARNING **: 22:48:51.639: vkWaitForFences(): The logical or physical device has been lost. (VK_ERROR_DEVICE_LOST) (-4).

To Reproduce

  • Run gtk4-demo
  • Resize it

Bug Incidence

Always

nvidia-bug-report.log.gz

nvidia-bug-report.log.gz

More Info

No response

Kimiblock avatar Jan 16 '25 15:01 Kimiblock

This seems to happen after S3 sleeping. Steam crashes constantly too. The latter is fixed after removing PAT parameters

Kimiblock avatar Jan 22 '25 04:01 Kimiblock

@Kimiblock does it only happen in the following situation?

  1. You run a Vulkan application
  2. You put the system to S3 sleep
  3. You wake up the system
  4. You resize the application
  5. vkWaitForFences fails with VK_ERROR_DEVICE_LOST

Binary-Eater avatar Feb 01 '25 20:02 Binary-Eater

After some more testing, I found out that even without sleeping GTK Demo gets stuck with vkQueueSubmit(): The logical or physical device has been lost. (VK_ERROR_DEVICE_LOST) (-4).

Kimiblock avatar Feb 02 '25 03:02 Kimiblock

Here's the complete log:

➜  ~ env -u GSK_RENDERER gtk4-demo

(gtk4-demo:2983): Gtk-WARNING **: 11:25:24.627: Unknown key gtk-modules in /home/kimiblock/.config/gtk-4.0/settings.ini

(gtk4-demo:2983): Gtk-CRITICAL **: 11:25:24.946: gtk_css_section_get_bytes: assertion 'section != NULL' failed

(gtk4-demo:2983): Gtk-CRITICAL **: 11:25:24.946: gtk_css_section_get_bytes: assertion 'section != NULL' failed

(gtk4-demo:2983): Gtk-CRITICAL **: 11:25:24.986: gtk_css_section_get_bytes: assertion 'section != NULL' failed

(gtk4-demo:2983): Gtk-CRITICAL **: 11:25:24.986: gtk_css_section_get_bytes: assertion 'section != NULL' failed

(gtk4-demo:2983): Gtk-CRITICAL **: 11:25:38.040: gtk_css_section_get_bytes: assertion 'section != NULL' failed

(gtk4-demo:2983): Gtk-CRITICAL **: 11:25:38.040: gtk_css_section_get_bytes: assertion 'section != NULL' failed

(gtk4-demo:2983): Gtk-CRITICAL **: 11:25:43.153: gtk_css_section_get_bytes: assertion 'section != NULL' failed

(gtk4-demo:2983): Gtk-CRITICAL **: 11:25:43.153: gtk_css_section_get_bytes: assertion 'section != NULL' failed

(gtk4-demo:2983): Gsk-WARNING **: 11:25:43.445: vkQueueSubmit(): The logical or physical device has been lost. (VK_ERROR_DEVICE_LOST) (-4)

(gtk4-demo:2983): Gtk-CRITICAL **: 11:26:32.332: gtk_css_section_get_bytes: assertion 'section != NULL' failed

(gtk4-demo:2983): Gtk-CRITICAL **: 11:26:32.332: gtk_css_section_get_bytes: assertion 'section != NULL' failed

(gtk4-demo:2983): Gsk-WARNING **: 11:26:32.333: vkWaitForFences(): The logical or physical device has been lost. (VK_ERROR_DEVICE_LOST) (-4)

(gtk4-demo:2983): Gsk-WARNING **: 11:26:33.333: vkWaitForFences(): The logical or physical device has been lost. (VK_ERROR_DEVICE_LOST) (-4)

(gtk4-demo:2983): Gsk-WARNING **: 11:26:33.334: vkQueueSubmit(): The logical or physical device has been lost. (VK_ERROR_DEVICE_LOST) (-4)

(gtk4-demo:2983): Gtk-CRITICAL **: 11:26:38.414: gtk_css_section_get_bytes: assertion 'section != NULL' failed

(gtk4-demo:2983): Gtk-CRITICAL **: 11:26:38.414: gtk_css_section_get_bytes: assertion 'section != NULL' failed

(gtk4-demo:2983): Gtk-CRITICAL **: 11:26:38.417: gtk_css_section_get_bytes: assertion 'section != NULL' failed

(gtk4-demo:2983): Gtk-CRITICAL **: 11:26:38.417: gtk_css_section_get_bytes: assertion 'section != NULL' failed

(gtk4-demo:2983): Gsk-WARNING **: 11:26:38.417: vkWaitForFences(): The logical or physical device has been lost. (VK_ERROR_DEVICE_LOST) (-4)

(gtk4-demo:2983): Gsk-WARNING **: 11:26:39.417: vkWaitForFences(): The logical or physical device has been lost. (VK_ERROR_DEVICE_LOST) (-4)

(gtk4-demo:2983): Gsk-WARNING **: 11:26:39.418: vkQueueSubmit(): The logical or physical device has been lost. (VK_ERROR_DEVICE_LOST) (-4)

(gtk4-demo:2983): Gtk-CRITICAL **: 11:26:44.598: gtk_css_section_get_bytes: assertion 'section != NULL' failed

(gtk4-demo:2983): Gtk-CRITICAL **: 11:26:44.598: gtk_css_section_get_bytes: assertion 'section != NULL' failed

(gtk4-demo:2983): Gsk-WARNING **: 11:26:44.598: vkWaitForFences(): The logical or physical device has been lost. (VK_ERROR_DEVICE_LOST) (-4)

(gtk4-demo:2983): Gsk-WARNING **: 11:26:45.598: vkWaitForFences(): The logical or physical device has been lost. (VK_ERROR_DEVICE_LOST) (-4)

(gtk4-demo:2983): Gsk-WARNING **: 11:26:45.599: vkQueueSubmit(): The logical or physical device has been lost. (VK_ERROR_DEVICE_LOST) (-4)

Repro steps:

  1. Install linux, nvidia-open, nvidia-utils in Arch
  2. Reboot into GNOME
  3. Start GTK 4 Demo

As for the proprietary driver, there's no difference from my testing.

Kimiblock avatar Feb 02 '25 03:02 Kimiblock

Thanks for sharing a simplified set of steps. I will try this hopefully on Monday and see if I can reproduce this as well.

Binary-Eater avatar Feb 02 '25 04:02 Binary-Eater

I think I can reproduce this issue. I found this issue because I stumbled over the same Gsk-WARNING messages using easyeffects. Interestingly I came across these freezes and warning messages when opening the top right hamburger menu inside easyeffects. I also found out that when I resize the easyeffects window it also freezes and prints the messages. Trying out the gtk4-demo the window instantly freezes when resizing a certain amount. In all cases the messages are the same, though without Gtk-CRITICAL messages:

easyeffects
(easyeffects:11080): Gsk-WARNING **: 23:27:34.707: vkWaitForFences(): The logical or physical device has been lost. (VK_ERROR_DEVICE_LOST) (-4)

(easyeffects:11080): Gsk-WARNING **: 23:27:35.707: vkWaitForFences(): The logical or physical device has been lost. (VK_ERROR_DEVICE_LOST) (-4)

(easyeffects:11080): Gsk-WARNING **: 23:27:35.707: vkQueueSubmit(): The logical or physical device has been lost. (VK_ERROR_DEVICE_LOST) (-4)

(easyeffects:11080): Gsk-WARNING **: 23:27:35.709: vkWaitForFences(): The logical or physical device has been lost. (VK_ERROR_DEVICE_LOST) (-4)

(easyeffects:11080): Gsk-WARNING **: 23:27:36.709: vkWaitForFences(): The logical or physical device has been lost. (VK_ERROR_DEVICE_LOST) (-4)

(easyeffects:11080): Gsk-WARNING **: 23:27:36.710: vkQueueSubmit(): The logical or physical device has been lost. (VK_ERROR_DEVICE_LOST) (-4)

(easyeffects:11080): Gsk-WARNING **: 23:27:36.710: vkWaitForFences(): The logical or physical device has been lost. (VK_ERROR_DEVICE_LOST) (-4)

(easyeffects:11080): Gsk-WARNING **: 23:27:37.710: vkWaitForFences(): The logical or physical device has been lost. (VK_ERROR_DEVICE_LOST) (-4)

(easyeffects:11080): Gsk-WARNING **: 23:27:37.711: vkWaitForFences(): The logical or physical device has been lost. (VK_ERROR_DEVICE_LOST) (-4)

(easyeffects:11080): Gsk-WARNING **: 23:27:38.711: vkWaitForFences(): The logical or physical device has been lost. (VK_ERROR_DEVICE_LOST) (-4)

(easyeffects:11080): Gsk-WARNING **: 23:27:38.712: vkWaitForFences(): The logical or physical device has been lost. (VK_ERROR_DEVICE_LOST) (-4)

(easyeffects:11080): Gsk-WARNING **: 23:27:39.712: vkWaitForFences(): The logical or physical device has been lost. (VK_ERROR_DEVICE_LOST) (-4)

(easyeffects:11080): Gsk-WARNING **: 23:27:39.713: vkWaitForFences(): The logical or physical device has been lost. (VK_ERROR_DEVICE_LOST) (-4)

(easyeffects:11080): Gsk-WARNING **: 23:27:40.713: vkWaitForFences(): The logical or physical device has been lost. (VK_ERROR_DEVICE_LOST) (-4)
gtk4-demo
(gtk4-demo:10935): Gsk-WARNING **: 23:26:30.197: vkWaitForFences(): The logical or physical device has been lost. (VK_ERROR_DEVICE_LOST) (-4)

(gtk4-demo:10935): Gsk-WARNING **: 23:26:31.197: vkWaitForFences(): The logical or physical device has been lost. (VK_ERROR_DEVICE_LOST) (-4)

(gtk4-demo:10935): Gsk-WARNING **: 23:26:31.198: vkQueueSubmit(): The logical or physical device has been lost. (VK_ERROR_DEVICE_LOST) (-4)

(gtk4-demo:10935): Gsk-WARNING **: 23:26:33.106: vkWaitForFences(): The logical or physical device has been lost. (VK_ERROR_DEVICE_LOST) (-4)

(gtk4-demo:10935): Gsk-WARNING **: 23:26:34.106: vkWaitForFences(): The logical or physical device has been lost. (VK_ERROR_DEVICE_LOST) (-4)

(gtk4-demo:10935): Gsk-WARNING **: 23:26:34.107: vkQueueSubmit(): The logical or physical device has been lost. (VK_ERROR_DEVICE_LOST) (-4)

(gtk4-demo:10935): Gsk-WARNING **: 23:26:36.348: vkWaitForFences(): The logical or physical device has been lost. (VK_ERROR_DEVICE_LOST) (-4)

(gtk4-demo:10935): Gsk-WARNING **: 23:26:37.348: vkWaitForFences(): The logical or physical device has been lost. (VK_ERROR_DEVICE_LOST) (-4)

(gtk4-demo:10935): Gsk-WARNING **: 23:26:37.348: vkQueueSubmit(): The logical or physical device has been lost. (VK_ERROR_DEVICE_LOST) (-4)

(gtk4-demo:10935): Gsk-WARNING **: 23:26:38.457: vkWaitForFences(): The logical or physical device has been lost. (VK_ERROR_DEVICE_LOST) (-4)

(gtk4-demo:10935): Gsk-WARNING **: 23:26:39.457: vkWaitForFences(): The logical or physical device has been lost. (VK_ERROR_DEVICE_LOST) (-4)

(gtk4-demo:10935): Gsk-WARNING **: 23:26:39.459: vkWaitForFences(): The logical or physical device has been lost. (VK_ERROR_DEVICE_LOST) (-4)

(gtk4-demo:10935): Gsk-WARNING **: 23:26:40.459: vkWaitForFences(): The logical or physical device has been lost. (VK_ERROR_DEVICE_LOST) (-4)

(gtk4-demo:10935): Gsk-WARNING **: 23:26:40.462: vkWaitForFences(): The logical or physical device has been lost. (VK_ERROR_DEVICE_LOST) (-4)

(gtk4-demo:10935): Gsk-WARNING **: 23:26:41.462: vkWaitForFences(): The logical or physical device has been lost. (VK_ERROR_DEVICE_LOST) (-4)

(gtk4-demo:10935): Gsk-WARNING **: 23:26:41.463: vkWaitForFences(): The logical or physical device has been lost. (VK_ERROR_DEVICE_LOST) (-4)

(gtk4-demo:10935): Gsk-WARNING **: 23:26:42.463: vkWaitForFences(): The logical or physical device has been lost. (VK_ERROR_DEVICE_LOST) (-4)
journalctl (same with gtk4-demo)
Feb 05 23:42:44 DESKTOP-ARCH6UWU9 kernel: NVRM: Xid (PCI:0000:01:00): 13, pid=802, name=Hyprland, Graphics Exception on GPC 0: WIDTH CT Violation. Coordinates: (0x300, 0x68)
Feb 05 23:42:44 DESKTOP-ARCH6UWU9 kernel: NVRM: Xid (PCI:0000:01:00): 13, pid=802, name=Hyprland, Graphics Exception: ESR 0x500420=0x80000010 0x500434=0x680300 0x500438=0x1800 0x50043c=0x0
Feb 05 23:42:44 DESKTOP-ARCH6UWU9 kernel: NVRM: Xid (PCI:0000:01:00): 13, pid=802, name=Hyprland, Graphics Exception on GPC 1: WIDTH CT Violation. Coordinates: (0x300, 0x0)
Feb 05 23:42:44 DESKTOP-ARCH6UWU9 kernel: NVRM: Xid (PCI:0000:01:00): 13, pid=802, name=Hyprland, Graphics Exception: ESR 0x508420=0x80000010 0x508434=0x300 0x508438=0x1800 0x50843c=0x0
Feb 05 23:42:44 DESKTOP-ARCH6UWU9 kernel: NVRM: Xid (PCI:0000:01:00): 13, pid=802, name=Hyprland, Graphics Exception on GPC 2: WIDTH CT Violation. Coordinates: (0x300, 0x10)
Feb 05 23:42:44 DESKTOP-ARCH6UWU9 kernel: NVRM: Xid (PCI:0000:01:00): 13, pid=802, name=Hyprland, Graphics Exception: ESR 0x510420=0x80000010 0x510434=0x100300 0x510438=0x1800 0x51043c=0x0
Feb 05 23:42:44 DESKTOP-ARCH6UWU9 kernel: NVRM: Xid (PCI:0000:01:00): 13, pid=802, name=Hyprland, Graphics Exception on GPC 3: WIDTH CT Violation. Coordinates: (0x300, 0x60)
Feb 05 23:42:44 DESKTOP-ARCH6UWU9 kernel: NVRM: Xid (PCI:0000:01:00): 13, pid=802, name=Hyprland, Graphics Exception: ESR 0x518420=0x80000010 0x518434=0x600300 0x518438=0x1800 0x51843c=0x0
Feb 05 23:42:44 DESKTOP-ARCH6UWU9 kernel: NVRM: Xid (PCI:0000:01:00): 13, pid=802, name=Hyprland, Graphics Exception on GPC 4: WIDTH CT Violation. Coordinates: (0x300, 0x8)
Feb 05 23:42:44 DESKTOP-ARCH6UWU9 kernel: NVRM: Xid (PCI:0000:01:00): 13, pid=802, name=Hyprland, Graphics Exception: ESR 0x520420=0x80000010 0x520434=0x80300 0x520438=0x1800 0x52043c=0x0
Feb 05 23:42:44 DESKTOP-ARCH6UWU9 kernel: NVRM: Xid (PCI:0000:01:00): 13, pid=802, name=Hyprland, Graphics Exception on GPC 5: WIDTH CT Violation. Coordinates: (0x300, 0x18)
Feb 05 23:42:44 DESKTOP-ARCH6UWU9 kernel: NVRM: Xid (PCI:0000:01:00): 13, pid=802, name=Hyprland, Graphics Exception: ESR 0x528420=0x80000010 0x528434=0x180300 0x528438=0x1800 0x52843c=0x0
Feb 05 23:42:44 DESKTOP-ARCH6UWU9 kernel: NVRM: Xid (PCI:0000:01:00): 13, pid=11949, name=easyeffects, Graphics Exception: ChID 002e, Class 0000c797, Offset 00000000, Data 00000000

As soon as this freezing starts (~0.25 fps) it also affects my cursor until I close the application or move into another workspace.

My setup: Arch Linux 6.13.1-arch1-1 Hyprland 0.47.2 Nvidia Geforce RTX 3080 (Desktop) Single Monitor: 7680x2160 (x1.25) @ 120 Hz 57"

Driver: libva-nvidia-driver 0.0.13-1 nvidia-open-dkms 570.86.16-2 nvidia-utils 570.86.16-2

Let me know if I can be of any further help.

Zhnz avatar Feb 05 '25 22:02 Zhnz

Working on reproducing this but have opened an internal bug, 5118425, for now, so I do not lose track of this.

Binary-Eater avatar Feb 18 '25 23:02 Binary-Eater

Somebody here seems to suggest fractional scaling may exacerbate the issue: "changing 145% to 150% on kde seems to make them a lot more stable"

maaarghk avatar Feb 19 '25 15:02 maaarghk

Hi everyone,

Just wanted to provide an update that we noticed an issue upstream with Gtk4. It seems other developers noticed this issue as well and beat us to a fix.

  • https://gitlab.gnome.org/GNOME/gtk/-/commit/5150b3910c7fdc18d56771a24abdab5c15ff198f
  • https://gitlab.gnome.org/GNOME/gtk/-/issues/7314

We are noticing some potential issues with upstream gtk4 we are also taking a look into.

Binary-Eater avatar Mar 08 '25 00:03 Binary-Eater