egl-wayland icon indicating copy to clipboard operation
egl-wayland copied to clipboard

Xwayland VRAM usage is still excessive when resizing X11 apps under wayland.

Open shelterx opened this issue 1 year ago • 66 comments

I'm not sure what the "Fix an issue causing KDE crashes, which also caused excessive VRAM usage when resizing." was supposed to fix. Resizing X11 apps like steam still makes Xwayland VRAM usage skyrocket but seems to stop at around 1.3GB. I'm not sure exactly what component causes this but I'll leave it here.

shelterx avatar Aug 11 '24 18:08 shelterx

For background, that was already reported against Xwayland here: https://gitlab.freedesktop.org/xorg/xserver/-/issues/1687

ofourdan avatar Aug 13 '24 07:08 ofourdan

Please fix: https://forums.developer.nvidia.com/t/560-release-feedback-discussion/300830/165?u=nicneme123

Bunnysword avatar Aug 13 '24 10:08 Bunnysword

This issue is not limited to Xwayland:

  • If you resize a Wayland window the GPU memory usage of kwin_wayland, gnome-shell or any other wayland VM will increase in the same way.
  • In KDE desktop if you are holding left mouse button to show selection and moving mouse, the GPU memory usage of plasmashell will also increase in the same way.

thesword53 avatar Aug 15 '24 17:08 thesword53

@thesword53 indeed... you are correct, how did I miss that. I resized Konsole and here's the result: image

Good find!

Version used: Driver: 560.31.02 egl-wayland-https://github.com/NVIDIA/egl-wayland/commit/f30cb0e4c9a215e933dc8250f5dad4e96d4f2136

shelterx avatar Aug 15 '24 19:08 shelterx

This issue is not limited to Xwayland:

It's not limited to just Wayland session either, kwin_x11 also eats VRAM when resizing. I don't recall having that issue before. So it's probably not an egl-wayland bug at all, I'll leave the issue open until it's fixed tho'.

However, kwin_x11 does release the memory after a while.. but it does it slow.

shelterx avatar Aug 15 '24 21:08 shelterx

I can confirm, I went back to X11. On Fedora Workstation 40 with NVIDIA 560.35.03 and a RTX 3090.

On Wayland my average desktop uses 11 GB / 24 GB VRAM (46%) with just a web browser open. It impairs my ability to run games or AI workloads, because basically half the card's memory is wasted. One time it even reached the point where all apps crashed because VRAM ran out.

On X11 my average desktop uses 3 GB / 24 GB VRAM (12.5%) for the same workload. Games and AI workloads run great.

The issue seems to be:

  • NVIDIA driver leaks VRAM. The amount wasted/hoarded always grows over time.
  • Xwayland is a persistent process on Wayland (it sits around forever), so the leaked VRAM from running X11 apps NEVER gets releaesd.
  • Even when my GPU needs the hoarded/leaked VRAM, the Xwayland process doesn't release it.
  • The conclusion from Xwayland devs was that the problem is from NVIDIA driver because it doesn't leak with AMD: https://gitlab.freedesktop.org/xorg/xserver/-/issues/1687

This is in addition to Wayland's other issues, such as Chromium-based browsers frequently breaking when opening new windows, causing the windows to render in a glitched way and offset by about a titlebar's height from the top of the screen, and you have to click and drag the "invisible" (totally transparent) titlebar to resize the window to get it to render properly.

And Wayland's lack of basic features such as global keyboard shortcuts/keybinds.

It's not just NVIDIA that has problems on Wayland. Most things do.

I am going back to X11 for the next 12 months and will see if Wayland is better in 2026. At least X11 is usable. :D Wayland needs more time in the bakery. Fedora plans to remove X11 by default in Fedora 41, but I'll just install it manually since Wayland is totally unusable at the moment.

Arcitec avatar Sep 23 '24 19:09 Arcitec

I have fully tried to reproduce this issue to no Avail If anyone has a 100% certain way please let me know. Tried on arch and fedora and Pikaos 4

ryzendew avatar Sep 23 '24 21:09 ryzendew

I can reproduce this pretty readily on KDE on arch, running nvidia 560.35.03. Just open a konsole and move resize it a bunch of times, run nvidia-smi and notice that the VRAM of kwin_wayland will go up to about 10% of the total VRAM, and kind of stop there.

Perhaps this does have to do with how nvidia is allocating, or garbage collecting, or perhaps even reporting VRAM.

kelvie avatar Sep 23 '24 23:09 kelvie

I have fully tried to reproduce this issue to no Avail

https://streamable.com/2ufy13

This sort of demonstrates the issue, watch kwin VRAM usage after the resizing.

  • This is under Xorg so it does free it (even faster here since OBS is running it seems).
  • If you to the same thing under wayland it never gets freed, until you actually close the Konsole window. And If you resize an X11 app under Wayland, like Steam, Xwayland VRAM usage does not get freed at all.

shelterx avatar Sep 24 '24 08:09 shelterx

ok confirmed it's a thing on gnome as well A friend on a 7800XT confirmed it happens on amd as well

adding a video https://streamable.com/ht9cu2

ryzendew avatar Sep 25 '24 02:09 ryzendew

This has always happened for me on xwayland (I filed that bug on FDO)but only recently for kwin_wayland, anyone know if there's an Nvidia version/egl-wayland version combo that doesn't have this problem and has explicit sync?

kelvie avatar Sep 25 '24 04:09 kelvie

@kelvie are you sure? It's possible it's been like that for a while. But it's easy to miss that it happens with kwin_wayland, because if you close the window that made the vram leak. Kwin vram usage goes back to normal.

Update Here's the 550.40.71 dev driver, so yeah, it's in 550 too. Image

shelterx avatar Sep 25 '24 17:09 shelterx

@shelterx

https://gitlab.freedesktop.org/xorg/xserver/-/issues/1617

This has happened since 545.29.06 with xwayland, I had to switch all my apps to wayland native apps to combat this if I wanted my vram.

Only with 560 it started happening with kwin_wayland for me (I also had a short stint with hyprland so I don't remember when I switched back), and currently with 560, as far as I can tell, kwin_wayland never gives back the memory even after I close the windows.

kelvie avatar Sep 25 '24 20:09 kelvie

https://gitlab.freedesktop.org/xorg/xserver/-/merge_requests/1704 let's test this

ryzendew avatar Sep 25 '24 22:09 ryzendew

After testing that PR the issue is semi fixed https://gitlab.freedesktop.org/-/project/371/uploads/4af729a970faa28b667669bac1b8531f/Screencast_From_2024-09-25_20-48-02.mp4 here is a video

ryzendew avatar Sep 25 '24 23:09 ryzendew

After testing that PR the issue is semi fixed https://gitlab.freedesktop.org/-/project/371/uploads/4af729a970faa28b667669bac1b8531f/Screencast_From_2024-09-25_20-48-02.mp4 here is a video

FYl. This is not a fix. It's only helps to debug/track/trace.

gilvbp avatar Sep 26 '24 01:09 gilvbp

Yeah, I went back and tested with 555 and 550 as well, and still the same thing, kwin_Wayland using 2.4 to 2.7GB of my 24GB vram after resizing windows, and not freeing it even after windows are closed.

kelvie avatar Sep 26 '24 02:09 kelvie

I've started a new topic here: https://forums.developer.nvidia.com/t/multiple-wayland-compositors-not-freeing-vram-after-resizing-windows/307939

There are multiple issues here (Xwayland, compositors) and multiple components (xorg-server, multiple wayland compositors, this repo, nvidia drivers), so hopefully we get to the bottom of this.

In summary, I've reproduced this on:

nvidia versions:

  • 560.35.03
  • 555.58.02
  • 550.78

compositors:

  • kwin_wayland
  • sway
  • weston

egl-wayland versions:

  • 1.1.9
  • 1.1.13
  • 1.1.2
  • 1.1.16

with the same test, open a terminal and resize it over and over again, close it, and check the compositor's VRAM usage using nvidia-smi

Every time it's around 2.5GB on my 24GB 4090

kelvie avatar Sep 26 '24 02:09 kelvie

I had an old install with 525 and KDE 5, can't say I managed to reproduce it there but I had no Wayland session installed so I had to rely on the X11 test.

shelterx avatar Sep 26 '24 04:09 shelterx

So...

  • The Xwayland VRAM allocation release issue is not present in Vulkan Dev drivers 550.40.71 and 550.40.75, it stays at around 10-13Mb.
  • And kwin_wayland does release the memory if you minimize the window you resized, probably not working as intended but it's a quick fix, try that with 560. (i'm a bit tired of switching drivers now) UPDATE: Actually all windows that uses kwin needs to be minimized...

shelterx avatar Sep 27 '24 13:09 shelterx

@shelterx Wow that's a trippy workaround (the minimizing one for kwin_wayland), it does seem to work, I wonder the e(gl) calls that are at work here. plasmashell doesn't seem to free it's vram, but maybe that's another issue.

kelvie avatar Sep 27 '24 15:09 kelvie

I'm testing this a bit more, and it seems just using the Alt+TAB switcher in kwin resets the VRAM -- very strange. Maybe something to do with how the window thumbnails are being created for that?

kelvie avatar Sep 27 '24 19:09 kelvie

Thank you for all the reports and attempts to narrow down the issue. I believe there are actually two separate issues tracked here:

  • Excessive memory consumption by Xwayland.
  • Excessive memory consumption by Wayland compositors, e.g., kwin_wayland.

I've looked into the latter issue, and at this point it is well understood. We do not need additional information or reports of reproductions for that issue. See below for more information.

We have not been able to reproduce the issues with Xwayland/X applications with the latest version of Xwayland and latest drivers. If you are still experiencing that particular issue, please share reproduction steps (ideally starting from a clean boot), the amount of persistent memory usage you are seeing and how you are measuring it, and your system details (Run nvidia-bug-report.sh, attach the log it generates, list your Xwayland and compositor version numbers and ideally distro package versions if you're using distro packages).

For the Wayland compositor memory usage issue, there isn't a leak per-se, but the heuristics that decide which memory to retain for performance reasons aren't working optimally when presented with the OpenGL API usage typical of a Wayland compositor. While we work to develop and deploy a driver fix, I can offer this workaround:

  • Download this JSON file: 50-limit-free-buffer-pool-in-wayland-compositors.txt.
  • Edit it to replace 'kwin_wayland' with the name of your Wayland compositor if necessary.
  • Create the directory '/etc/nvidia/nvidia-application-profiles-rc.d' if it doesn't already exist on your system, and place the file there.
  • Restart your compositor (Reboot or log out/log back in).

That should resolve this class of memory usage issues within the named application. You can also duplicate the entire rule in the JSON file if you regularly switch between multiple Wayland compositors, e.g:

        {
            "pattern": {
                "feature": "procname",
                "matches": "kwin_wayland"
            },
            "profile": "Limit Free Buffer Pool On Wayland Compositors"
        },
        {
            "pattern": {
                "feature": "procname",
                "matches": "gnome-shell"
            },
            "profile": "Limit Free Buffer Pool On Wayland Compositors"
        }

cubanismo avatar Sep 27 '24 19:09 cubanismo

Thank you for this! And to be clear, we are to create a .txt file filled with JSON, and not a .json file in that directory?

Edit: Just tested the instructions as is, copied that file and placed it in the directory with a .txt extension and it's fixed! thank you!

kelvie avatar Sep 27 '24 19:09 kelvie

We are to create a .txt file filled with JSON, and not a .json file in that directory?

The driver doesn't care what the file is called. I didn't have an extension on the file originally, but github only accepts certain file names (See https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/attaching-files), so I renamed it .txt. Name it whatever you like.

cubanismo avatar Sep 27 '24 19:09 cubanismo

@cubanismo Thank you for your reply, much appreciated. I will try the kwin/gnomeshell workaround. I think it happens with plasmashell too tho'.

We have not been able to reproduce the issues with Xwayland/X applications with the latest version of Xwayland and latest drivers.

Not sure which driver you are referring as latest but I can't reproduce it with the latest dev drivers, however with 560.35.03 it's easy. Just resize the steam window for example.

Edit: The workaround for kwin_wayland works here. Additonal info, I see no VRAM spikes in KDE like @mlhhqh experienced in gnome, mentioned below.

shelterx avatar Sep 27 '24 20:09 shelterx

    {
        "pattern": {
            "feature": "procname",
            "matches": "kwin_wayland"
        },
        "profile": "Limit Free Buffer Pool On Wayland Compositors"
    },
    {
        "pattern": {
            "feature": "procname",
            "matches": "gnome-shell"
        },
        "profile": "Limit Free Buffer Pool On Wayland Compositors"
    }

Can confirm works on Gnome 46, Silverblue, 560.35.03

Still very subpar results. After opening a gnome session opening a terminal (Wayland) and resizing it around a bit usage spikes up to 1.4GB (from ~300mb). Vram usage goes down very slowly (yet noticeably on user interaction like moving a window)

mlhhqh avatar Sep 27 '24 20:09 mlhhqh

@cubanismo works for me! Gnome 47, CachyOS (Arch), 560.35.03. VRAM usage stays at ~400MB while resizing terminal window with nvtop open (before it was up to 1.4GB).

ppogorze avatar Sep 27 '24 21:09 ppogorze

FYI, you can also add kwin_x11 if you use Xorg, it makes kwin_x11 stay on sane levels and doesn't overallocate.

shelterx avatar Sep 27 '24 22:09 shelterx

FYI, you can also add kwin_x11 if you use Xorg, it makes kwin_x11 stay on sane levels and doesn't overallocate.

Yeah, and while at it, add plasmashell, too. plasmashell is happy with under 300MB now instead of climbing up above 700MB. I also added the Xorg process itself. Not sure if it helps, seems to be a little lower (maybe 100-200MB less).

kakra avatar Sep 28 '24 01:09 kakra