cosmic-comp icon indicating copy to clipboard operation
cosmic-comp copied to clipboard

Browser WebGL Creation Fails on NVIDIA

Open WatchMkr opened this issue 1 year ago • 12 comments

NVIDIA RTX A2000 12GB Driver 525.85.05

From about:support

WebGL 1 Driver Renderer WebGL creation failed:

  • WebglAllowWindowsNativeGl:false restricts context creation on this system. ()
  • Exhausted GL driver options. (FEATURE_FAILURE_WEBGL_EXHAUSTED_DRIVERS)

WebGL 2 Driver Renderer WebGL 2 Driver Renderer WebGL creation failed:

  • AllowWebgl2:false restricts context creation on this system. ()

Window Protocol wayland

Desktop Environment pop:cosmic

Quite a few errors related to blocklisting by glxinfo #BLOCKLIST_FEATURE_FAILURE_GLXTEST_FAILED Blocklisted by gfxInfo

Failure Log No GPUs detected via PCI glxtest: process failed (received signal 11)

WatchMkr avatar Mar 03 '23 21:03 WatchMkr

I see the same behavior if I run this on my gaze15 with cosmic-comp set to use the Nvidia GPU for rendering, with MOZ_ENABLE_WAYLAND=1.

With MOZ_ENABLE_WAYLAND=0 it seems to be falling back to LLVMPipe for some reason? Which "works", but not well.

glxtest: process failed (received signal 11)

Signal 11 is SIGSEGV. So it could be a segfault within the driver? But also it shouldn't use GLX on Wayland (it shouldn't be possible to use GLX). So unless that's inaccurately named something's going wrong if it's using GLX rather than EGL.

ids1024 avatar Mar 03 '23 22:03 ids1024

Or looking at the Firefox source, maybe it's expected to be prone to segfaults. But anyway, it shouldn't be using glx.

  // bug 639842 - it's very important to fire this process BEFORE we set up
  // error handling. indeed, this process is expected to be crashy, and we
  // don't want the user to see its crashes. That's the whole reason for
  // doing this in a separate process.
  //  
  // This call will cause a fork and the fork will terminate itself separately
  // from the usual shutdown sequence
  fire_glxtest_process();   

ids1024 avatar Mar 03 '23 22:03 ids1024

Chrome is also dropping to LLVMPipe though more gracefully. Appears this is related to nvidia in general. Issue title updated.

chrome://gpu/
WebGL: Software only, hardware acceleration unavailable
WebGL2: Software only, hardware acceleration unavailable
WebGPU: Disabled

WatchMkr avatar Mar 06 '23 22:03 WatchMkr

Hm, these errors logged to journalctl may be relevant. Or is something worth checking anyway.

I see the compositor is still advertising a bunch of formats with the zwp_linux_dmabuf_v1, but unlike amdgpu on Dev One no formats are available with linear modifiers. Only modifiers I presume are vendor specific formats. Maybe that's expected on this GPU though, and it isn't obviously something that should cause an issue for webgl...

Mar 07 09:41:58 pop-os cosmic-comp[248483]: [EGL] 0x300c (BAD_PARAMETER) eglQueryDmaBufModifiersEXT: EGL_BAD_PARAMETER error: In eglQueryDmaBufModifiersEXT: Invalid format
Mar 07 09:41:58 pop-os cosmic-comp[248483]: [EGL] 0x300c (BAD_PARAMETER) eglQueryDmaBufModifiersEXT: EGL_BAD_PARAMETER error: In eglQueryDmaBufModifiersEXT: Invalid format
Mar 07 09:41:58 pop-os cosmic-comp[248483]: [EGL] 0x300c (BAD_PARAMETER) eglQueryDmaBufModifiersEXT: EGL_BAD_PARAMETER error: In eglQueryDmaBufModifiersEXT: Invalid format
Mar 07 09:41:58 pop-os cosmic-comp[248483]: [EGL] 0x300c (BAD_PARAMETER) eglQueryDmaBufModifiersEXT: EGL_BAD_PARAMETER error: In eglQueryDmaBufModifiersEXT: Invalid format

Edit: Actually, I see there's a comment in Smithay about this call failing being a known issue with Nvidia's driver. And it hopefully shouldn't prevent things from working...

ids1024 avatar Mar 07 '23 18:03 ids1024

Okay, so looking at the Firefox code and the bug tracker:

  • Firefox forks a process called glxtest to test what is supported. It uses a separate process so it can handle crashes.
  • Despite the name, glxtest uses EGL on Wayland, and now attempts to use EGL before GLX on X.
    • So the segfault is indeed the problem and not some distraction, but does not actually indicate GLX is involved.
  • The process is segfaulting. Presumably somewhere in Nvidia's EGL implementation, but I'm having trouble debugging the subprocess.
    • For some reason Firefox hangs and never gets to the point in gdb with detach-on-fork disabled.
  • https://bugzilla.mozilla.org/show_bug.cgi?id=1787597 seems similar. https://bugzilla.mozilla.org/show_bug.cgi?id=1768260 mentions a crash that sounds like it could occur with Nvidia/wayland in egltest, but it looks like that shouldn't be a segfault?

ids1024 avatar Mar 07 '23 19:03 ids1024

Okay, I can break fork then set follow-fork-mode child after the first fork to attach the debugger to the glxtest process. And get debug symbols for libraries through debuginfod.

#0  0x00007ffff59a1ee8 in queue_event (len=4096, display=0x7ffff779c430) at ../src/wayland-client.c:1499
#1  read_events (display=0x7fffffffc1a0) at ../src/wayland-client.c:1622
#2  wl_display_read_events (display=display@entry=0x7ffff779c430) at ../src/wayland-client.c:1705
#3  0x00007ffff59a2d59 in wl_display_dispatch_queue (queue=<optimized out>, display=<optimized out>)
    at ../src/wayland-client.c:1944
#4  wl_display_dispatch_queue (display=display@entry=0x7ffff779c430, queue=queue@entry=0x7ffff779c500)
    at ../src/wayland-client.c:1912
#5  0x00007ffff59a3d7f in wl_display_roundtrip_queue (display=0x7ffff779c430, queue=0x7ffff779c500)
    at ../src/wayland-client.c:1358
#6  0x00007fffeed0dcbb in  () at /usr/lib/firefox/libxul.so
#7  0x00007fffeed0e406 in  () at /usr/lib/firefox/libxul.so
#8  0x00007ffff145e81a in  () at /usr/lib/firefox/libxul.so
#9  0x00007fffeed0b544 in  () at /usr/lib/firefox/libxul.so
#10 0x00007fffeed0b9c3 in  () at /usr/lib/firefox/libxul.so
#11 0x00005555555c2c10 in _start ()

Kind of a strange place to segfault, but consistent with a comment in Firefox's glxtest.cpp:

  // This is enough to crash some broken NVIDIA prime + Wayland setups, see
  // https://github.com/NVIDIA/egl-wayland/issues/41 and bug 1768260.
  wl_display_roundtrip(dpy);

ids1024 avatar Mar 07 '23 20:03 ids1024

In particular libwayland-client is failing at the line if (opcode >= proxy->object.interface->event_count) {, since proxy->object.interface is not a valid pointer.

ids1024 avatar Mar 07 '23 20:03 ids1024

Glxtest seems to segfault on sway and Gnome Wayland too, with the Nvidia GPU. Though for some reason the whole Firefox process is failing too when I try it there?

So it doesn't seem to be an issue on our end. I guess something is wrong with the Nvidia Wayland EGL implementation.

ids1024 avatar Mar 08 '23 17:03 ids1024

https://github.com/NVIDIA/egl-wayland/issues/64 describes a segfault in Firefox in glxtest dereferencing the same thing in queue_event.

ids1024 avatar Mar 08 '23 20:03 ids1024

This appears to be fixed when I build and install https://github.com/NVIDIA/egl-wayland from git. So we should probably be able to package a newer release.

ids1024 avatar Mar 08 '23 20:03 ids1024

Actually it seems https://github.com/NVIDIA/egl-wayland/commit/7af7082eac2170da66990edda2ca73a221e2086e is the commit that fixes it, so it's not in the latest release.

ids1024 avatar Mar 08 '23 21:03 ids1024

Actually it seems NVIDIA/egl-wayland@7af7082 is the commit that fixes it, so it's not in the latest release.

I actually think it's NVIDIA/egl-wayland@c63bf73 that fixes it.

Regardless, latest release of egl-wayland still has the issue; a new release should be made. I will test soon with git build of egl-wayland; stuck on 1.1.11 because of a third-party driver installer.

xlacroixx avatar Apr 22 '23 22:04 xlacroixx