egl-wayland
egl-wayland copied to clipboard
egl needs an early out to prevent waking the dGPU unnecessarily
On the last two/three years of hybrid laptops, notably Nvidia RTX20xx++ onwards these machines tend to have a better/deeper suspend function which puts the dgpu in to a very low power state when unused.
Combined with glvnd, this introduces a lag or 1-2 seconds while the dgpu wakes in response to queries. Even if it remains unused and the iGPU is used instead. For example opening Nautilus file manager is delayed 1-2s while the dGPU wakes. For a lot of apps that use glvnd this ends up being a bad UX.
A lot of folks are working around this with __EGL_VENDOR_LIBRARY_FILENAMES=/usr/share/glvnd/egl_vendor.d/50_mesa.json.
I reported this here some time ago
yeah, it hurts battery life too
having the gpu wakeup and blast it's fans every time an app is open
This should be fixed by https://github.com/NVIDIA/egl-wayland/commit/ba6c38ad74cf0ef6ec4d7934f68c17a7a2d460ca
This should be fixed by ba6c38a
Seems a bit hit and miss, but this is likely to be due to how some apps (like Firefox, Vscode, Geary, Evolution) maybe handle GPU stuff. These apps will still wake the GPU, but other apps like Nautilus no-longer do this.
- Nautilus 45 still opens the GPU with the latest egl-wayland
Seems a bit hit and miss, but this is likely to be due to how some apps (like Firefox, Vscode, Geary, Evolution) maybe handle GPU stuff. These apps will still wake the GPU, but other apps like Nautilus no-longer do this.
Nautilus 45 still opens the GPU with the latest egl-wayland
I see the same behaviour as the first comment with applications such as VSCode (even when using the Wayland backend), but not the last: GTK4 apps that were previously problematic such as Nautilus now no longer start the GPU or have the noticeable delay spinning up - also confirmed by monitoring the dGPU state using watch cat /sys/class/drm/card*/device/power_state
.
Might be worth mentioning for completeness that if the app in question is running in Flatpak, it's not yet fixed likely because the newest release of this library hasn't landed in the base runtimes yet.
- Nautilus 45 still opens the GPU with the latest egl-wayland
https://youtu.be/gKYoFEvtUJ4
Yeah, anything with Flatpak would need an update to its runtime environment to pick up an updated egl-wayland library.
It might be possible to work around that by using flatpak override --filesystem
to map the host's copy of libnvidia-egl-wayland.so.1 through to the container, though at that point it's probably easier to just use the __EGL_VENDOR_LIBRARY_FILENAMES
workaround instead.
For other applications, if the app itself (or some other library) tries to call eglQueryDevicesEXT on its own, then it would run into the same problem. Firefox might do that, but I couldn't say for sure -- I think the last time I looked at Firefox's GL code was before Wayland even existed. It would surprise me if something like Geary or Evolution did that, though.
Now that I think about it, if an application calls eglGetDisplay(NULL)
, or eglGetPlatformDisplay
with EGL_PLATFORM_DEVICE_EXT
or EGL_PLATFORM_SURFACELESS_MESA
then that would also cause the NVIDIA GPU to wake up.
All of those would produce a headless EGLDisplay, without a windowing system associated with it. And without a windowing system, the driver has no way to know which device is driving the desktop.
https://youtu.be/gKYoFEvtUJ4
That's indeed weird - for me it doesn't bring the dGPU out of the D3Cold state. Since I'm assuming Nautilus isn't the experimental Flatpak version, could it be that you have some kind of specific configuration in place that makes the NVIDIA GPU your primary (card0) one? I notice that for me NVIDIA dGPU is card1
and the Intel iGPU card0
. Not sure if this has impact anywhere.
For other applications, if the app itself (or some other library) tries to call eglQueryDevicesEXT on its own, then it would run into the same problem. ...
That indeed makes sense, I assume in these cases we'd need to create the relevant issue reports for those projects separately since this is out of egl-wayland's hands?
Firefox and Electron make some sense because IIRC they also handle some iGPU/dGPU 'placement' for things such as WebGL, so it wouldn't surprise me if the underlying code is also querying the available GPUs for that.
I'm also wondering, though, if these specific remaining issues are then also a problem for hybrid GPU setups with an AMD or even Intel dGPU? I have none to test currently, but it might be interesting to mention in upstream reports and make it more testable for developers.
That indeed makes sense, I assume in these cases we'd need to create the relevant issue reports for those projects separately since this is out of egl-wayland's hands?
Most likely, yes. If an app actually does just need to do offscreen rendering, though, then there isn't really a good way to do that without running into this. Either it calls something like eglGetDisplay(NULL)
and lets implementation pick a device (which would result the NVIDIA driver wake up a GPU), or it would use EGL_EXT_platform_device or EGL_EXT_explicit_device, which would require calling eglQueryDevicesEXT anyway.
I'm also wondering, though, if these specific remaining issues are then also a problem for hybrid GPU setups with an AMD or even Intel dGPU? I have none to test currently, but it might be interesting to mention in upstream reports and make it more testable for developers.
Hard to say. If the driver for the dGPU is Mesa, then it would depend on how Mesa handles device enumeration and selection internally.
I wonder if the GPU offloading configuration proposal for libglvnd could help with this?
Most of the design for that would be about right, but I'll have to think about if I could tweak that interface to avoid unnecessary internal eglQueryDeviceEXT calls.
https://youtu.be/gKYoFEvtUJ4
That's indeed weird - for me it doesn't bring the dGPU out of the D3Cold state. Since I'm assuming Nautilus isn't the experimental Flatpak version, could it be that you have some kind of specific configuration in place that makes the NVIDIA GPU your primary (card0) one? I notice that for me NVIDIA dGPU is
card1
and the Intel iGPUcard0
. Not sure if this has impact anywhere.
Yes, it's the native nautilus package from Arch. In my case, most of times NVIDIA dGPU is card0
and the AMD iGPU is card1
, though sometimes reversion happens. Haven't done any changes.
It just occurred to me that the NVIDIA GBM library has the same problem of calling eglQueryDevices right away to try to find a matching device, so anything that tries to use EGL_KHR_platform_gbm would run into this as well. I'd be surprised if any application actually used both EGL_KHR_platform_gbm and EGL_KHR_platform_wayland, though.
But, disabling one or both of the wayland and GBM platform libraries would be a way to determine if the application is doing something directly to access an NVIDIA device, or if that's still coming from one of the platform libraries.
The __EGL_EXTERNAL_PLATFORM_CONFIG_DIRS
and __EGL_EXTERNAL_PLATFORM_CONFIG_FILENAMES
environment variables can control which platform libraries get loaded, like so:
# Disable all platform libraries
__EGL_EXTERNAL_PLATFORM_CONFIG_DIRS=/some/nonexistant/path /path/to/program
# Only load the GBM platform library
__EGL_EXTERNAL_PLATFORM_CONFIG_FILENAMES=/usr/share/egl/egl_external_platform.d/15_nvidia_gbm.json /path/to/program
I'd be surprised if any application actually used both EGL_KHR_platform_gbm and EGL_KHR_platform_wayland, though.
I believe recent versions of WebKit will do this. The web process uses GBM while the GUI process uses Wayland or X11.
Hi,
I've noticed that some apps are broken when applying ICD json file order workaround, either they are not opening:
Or partially broken with some UI elements not being displayed:
Temporarily removing WA makes everything work again (except for waking up NVIDIA GPU):
Is it something related to those apps/flatpak runtime? Or is it also a bug in EGL?
Is it something related to those apps/flatpak runtime? Or is it also a bug in EGL?
That depends -- what's the contents of that egl_vendor.d
directory?
Right now it looks like this (those are copies from default directory on host):
ls ~/.local/usr/share/glvnd/egl_vendor.d/ 50_mesa.json 60_nvidia.json
Basically there is no difference if I use "__EGL_VENDOR_LIBRARY_FILENAMES" and specify mesa ICD json file first, or use "__EGL_VENDOR_LIBRARY_DIRS" and point to another dir with changed filename for nvidia (10_nvidia.json -> 60_nvidia.json), the issue is the same.
I'd need to know more about what the application is trying to do to be sure, but my best guess is that it's using an offscreen EGLDisplay, but there's something in Mesa that it can't cope with. Calling something like eglGetDisplay(NULL)
will generally hand back an EGLDisplay from whatever vendor library is first.
If you use __EGL_VENDOR_LIBRARY_FILENAMES to limit it to only load Mesa, do you get the same problem?
If you use __EGL_VENDOR_LIBRARY_FILENAMES to limit it to only load Mesa, do you get the same problem?
Tried, unfortunately it is the same behaviour as using __EGL_VENDOR_LIBRARY_DIRS or __EGL_VENDOR_LIBRARY_FILENAMES "reversed".
I'd need to know more about what the application is trying to do to be sure
I can help with this if I would know what You want to check, any specific command output? My system is: Fedora Silverblue 39 Kernel 6.5.6 Nvidia driver 535.113.01 egl-wayland 1.1.12
Tried, unfortunately it is the same behaviour as using __EGL_VENDOR_LIBRARY_DIRS or __EGL_VENDOR_LIBRARY_FILENAMES "reversed".
That's enough to confirm my guess: With Mesa as the first (or only) vendor library, the application ends up using Mesa, and something in Mesa is either failing, missing, or behaving in a way that the application can't cope with. It's probably either a simple app bug or some feature that the app needs which Mesa doesn't have.
Either way, though, that means the problem is outside egl-wayland or the nvidia driver.
Using the search functionality in gnome shell wakes the gpu up. I kid you not.
The sudden spikes in power consumption I kept experiencing might be explained by this...
Using the search functionality in gnome shell wakes the gpu up. I kid you not.
That with the current version of egl-wayland?
It wouldn't surprise me if the search function spawned a new wayland client process, and if that's all it is, then commit ba6c38a should fix it.
egl-wayland package is version 1.1.12-3.fc39. Is this the latest version?
No, 1.1.13 is the one that has the fix for this: https://github.com/NVIDIA/egl-wayland/releases/tag/1.1.13
I can attest to 1.1.13 not fixing GNOME shell (45) search waking up the dGPU for me, but, since GNOME uses search providers (GNOME characters, nautilus, ...), it seems likely that one or more of those providers are contributing to the problem by hitting one of the aforementioned paths (by accident or by underlying code being called indirectly).
Using the search functionality of gnome shell no longer wakes up the GPU for me on egl-wayland-1.1.13-1.fc39
Fix appears to work as advertised. @kbrenneman
I've reported the wake up issue on Flatpak programs to upstream: https://gitlab.com/freedesktop-sdk/freedesktop-sdk/-/issues/1683
I'm also wondering, though, if these specific remaining issues are then also a problem for hybrid GPU setups with an AMD or even Intel dGPU? I have none to test currently, but it might be interesting to mention in upstream reports and make it more testable for developers.
Hard to say. If the driver for the dGPU is Mesa, then it would depend on how Mesa handles device enumeration and selection internally.
For me, nouveau behaves the same as the NVIDIA proprietary driver for me here (experiencing wakeups with Chromium/-based apps, neofetch, GNOME Settings -> About panel), so it's worth noting it's an issue on that side of the fence as well
https://gitlab.com/freedesktop-sdk/freedesktop-sdk/-/issues/1683#note_1713305231
Freedesktop upstream says that they don't ship egl-wayland separately; the binary provided by nvidia driver package is used, which is currently still at 1.1.12
.
This is why flatpak programs continue to be affected by this bug.
@erik-kz Is egl-wayland 1.1.13 going to be included with the next nvidia driver major release? If not, is there any timeline to do so? Asking to see if it's worth the trouble for freedesktop's runtime to package it separately.