xserver icon indicating copy to clipboard operation
xserver copied to clipboard

Possible VRAM leak on AMD GPUs?

Open AcolyteI opened this issue 4 months ago • 26 comments

Select the version

25.0.0.7-1, 25.0.0.5

Describe your issue

On startup, Plasma session with X11Libre takes about 500mb of VRAM, however over time VRAM usage increases drastically (with the same set of running applications) and gets to 4GB+ in about 18 hours. I am currently unable to test if this affects sessions other than Plasma.

Steps to reproduce

  1. Start Plasma 6 session with X11Libre.
  2. Wait for some time, come back and check VRAM usage.

What did you expect?

VRAM usage to stay more or less consistent over time with the same set of applications running.

Additional Information

I am able to consistently reproduce this on Artix box with RX7600 and Plasma 6. UPD: I've also tested this on XLibre 25.0.0.5 (issue reproduces) and Xorg 1.21.1.18 (issue does not reproduce).

Extra fields

AcolyteI avatar Aug 03 '25 13:08 AcolyteI

I'll try testing this in a bit (also plasma 6 on artix, with a radeon 6600)

terrorbyte69420 avatar Aug 03 '25 14:08 terrorbyte69420

X11Libre takes about 500mb of VRAM

BLOATWARE

vaguinerg avatar Aug 04 '25 21:08 vaguinerg

X11Libre takes about 500mb of VRAM

BLOATWARE

From what I can see in umr, Xorg's VRAM usage depends on opened applications, so this isn't its baseline VRAM usage. Anyway, I hope this was a joke

AcolyteI avatar Aug 04 '25 22:08 AcolyteI

I'm experiencing the same issue running Gentoo with latest Trinity Desktop on a Laptop with an Intel HD630 iGPU using glamor. After several hours of wotrk memory consumption breaks the 10GiB threshold. After quitting all rumming Applications there is still 8GiB allocated.

nkalkhof avatar Aug 05 '25 19:08 nkalkhof

The last release https://github.com/X11Libre/xserver/releases/tag/xlibre-xserver-25.0.0.8 mentions a possible memory leak:

  • #565

You may try to upgrade and see if this fix remove your memory leaks.

alexislefebvre avatar Aug 05 '25 23:08 alexislefebvre

The last release https://github.com/X11Libre/xserver/releases/tag/xlibre-xserver-25.0.0.8 mentions a possible memory leak:

You may try to upgrade and see if this fix remove your memory leaks.

Tested on 25.0.0.8, and it still reproduces. By the way, to anyone who will test this: to see if the leak persists you can run kquitapp6 plasmashell; kstart plasmashell which causes immediate spike of about 200 mb, as was pointed out in the telegram chat, no need to wait for hours. Interestingly, the spike is there for Xorg Xserver too, but the allocated memory is freed after a very short while.

AcolyteI avatar Aug 06 '25 12:08 AcolyteI

How do you check VRAM usage? Some special utilities from amd/nvidia/intel? Or it’s about normal RAM, not video memory?

algrid avatar Aug 09 '25 19:08 algrid

Just closed all my applications and checked my vram usage on my amd card using amdgpu_top. It is about 600mb, with only firefox open its 1377mb. I am using Stumpwm which is a lightweight WM. My uptime is 10 days. I have 24gb of vram so I dont know if that means that vram is used more liberally.

K1D77A avatar Aug 11 '25 06:08 K1D77A

Just tested the git master from Aug. 11. The issue is still there. After a day the total memory comsumption is about 8-9GiB even after I closed all applications. Strangely the command ps aux | awk '{print $6/1024 " MB\t\t" $11}' | sort -n shows X allocating only around 110MiB. After killing X the memory is released. No idea what causes that bloat. I use a HD630 iGPU and I have no idea how to measure the amount of allocated memory to the GPU.

After switching back to Xorg the memory consumtion is stable again.

Can anyone confirm that?

Btw.: I have t two screen setup and disconnect/reconnect the laptop with the docking station several times a day to take it to presentations in the office.

nkalkhof avatar Aug 12 '25 12:08 nkalkhof

checked my vram usage on my amd card using amdgpu_top

ps aux | awk '{print $6/1024 " MB\t\t" $11}' | sort -n

I want to make sure that we're talking about the same memory. ps prints values for RAM (system memory) vs amdgpu_top printing Video memory usage (physically located in GPU) - is this correct?

Maybe we have two different issues here or I'm just missing something?

algrid avatar Aug 12 '25 21:08 algrid

I want to make sure that we're talking about the same memory. ps prints values for RAM (system memory) vs amdgpu_top printing Video memory usage (physically located in GPU) - is this correct?

Yes I was talking about system memory! I have no descrete GPU only an Intel HD6330 and andgpu_top won't work there. And since the iGPU uses system memory and "ps aux" doesn't help it's hard to determine where the memory is wasted.
Freeing Cache/Buffer with "echo 3 > /proc/sys/vm/drop_caches" didn't help eiter.

Maybe we have two different issues here or I'm just missing something?

That's why I've asked if more people ran into that issue. Maybe I should open a new issue?

nkalkhof avatar Aug 13 '25 06:08 nkalkhof

@nkalkhof yeah, if it's a system RAM leak then I think we should create a separate issue.

@AcolyteI are you checking Video RAM usage using amdgpu_top?

algrid avatar Aug 13 '25 22:08 algrid

@algrid yes, I used both amdgpu_top to check overall VRAM consumption and umr to check consumption for individual processes, which confirmed that Xorg was the process hogging VRAM, see attached screenshot.

Image

AcolyteI avatar Aug 13 '25 22:08 AcolyteI

@AcolyteI What video driver are you using for your AMD GPU?

callmetango avatar Aug 14 '25 16:08 callmetango

For the record: The issue of @nkalkhof will be handled in #687.

callmetango avatar Aug 14 '25 16:08 callmetango

@callmetango AMDGPU with kernel 6.15.8-zen and Mesa 25.1.7, not sure which one is relevant. I haven't installed any xf86 drivers for the GPU (Xorg works fine without them). On more recent kernel versions (mainline artix kernel) GTT memory seems to accumulate leaked pages instead of VRAM.

AcolyteI avatar Aug 14 '25 16:08 AcolyteI

@AcolyteI

I haven't installed any xf86 drivers for the GPU

Thank you! Then XLibre falls back to the built-in modesetting driver. I labeled the issue accordingly.

callmetango avatar Aug 14 '25 17:08 callmetango

@AcolyteI could you please try the initial release

It would be good to find a version without this issue.

algrid avatar Aug 14 '25 17:08 algrid

@algrid I tested 25.0.0.0 (downgraded both xlibre-xserver and xlibre-xserver-common) from artix archives, issue is still there, although somehow it leaks a bit less memory that newer versions on restarting plasmashell? Not quite sure about it, but anyway, the leak is still there.

AcolyteI avatar Aug 14 '25 17:08 AcolyteI

Hmm, it's interesting. I wonder if the issue was already there at the point of fork? @AcolyteI could you please test that too when you have a chance?

If it's still there we would need somehow to check what commits are different compared to the Xorg 1.21.1.18 version.

algrid avatar Aug 14 '25 21:08 algrid

Can confirm the memleak happens even with amdgpu xf86 driver. Image

Xorg.conf:

Section "ServerFlags"
    Option "AutoAddGPU" "off"
    Option "Debug" "dmabuf_capable"
EndSection

Section "Device"
    Identifier "AMDGPU"
    Driver "amdgpu"
    BusID "PCI:9:0:0"
    Option "TearFree" "true"
    Option "VariableRefresh" "true"
    Option "AsyncFlipSecondaries" "true"
    Option "ShadowFB" "false"
    Option "Atomic" "true"
EndSection

Sadly couldn't check point-of-fork since it then just crashes with (EE) Failed to load /usr/lib/xorg/modules/xlibre-25.0/drivers/amdgpu_drv.so: /usr/lib/xorg/modules/xlibre-25.0/drivers/amdgpu_drv.so: undefined symbol: glamor_egl_create_textured_pixmap_from_gbm_bo.

mintplague avatar Sep 01 '25 06:09 mintplague

@callmetango AMDGPU with kernel 6.15.8-zen and Mesa 25.1.7, not sure which one is relevant. I haven't installed any xf86 drivers for the GPU (Xorg works fine without them). On more recent kernel versions (mainline artix kernel) GTT memory seems to accumulate leaked pages instead of VRAM.

The problem you are dealing with I believe is due to the mesa version you have. I recall a vram leak problem getting fixed in Mesa 25.2.0 onwards.

VJSLH avatar Sep 05 '25 05:09 VJSLH

Okay, after ~20 days of uptime it appears that VRAM leak happens even on fd.o Xorg 21.1.18, though I think it still took more time to fill up than with XLibre. Notably, restarting kwin with kwin_x11 --replace doesn't result in VRAM freeing up.

Image

mintplague avatar Sep 30 '25 19:09 mintplague

Apparently it's a drm module bug as there are reports of it happening even on wayland sessions. Same symptoms, memleak and "non-zero when fini" in dmesg. https://gitlab.freedesktop.org/drm/amd/-/issues/4187

mintplague avatar Oct 09 '25 20:10 mintplague

Did a bunch of more testing and found a way to trigger it in a short timespan, that is repeatedly killing and restarting plasmashell (e.g. while true; do kquitapp6 plasmashell; plasmashell&; sleep 5; done) while kwin compositing is enabled.

Artix, kernel 6.17.7-zen1-1-zen, xlibre 25.0.0.15-1, Plasma 6.5.3, kwin-x11-lite 6.5.3-1.1, same config as above. Wayland session, fd.o Xorg 21.1.20-1 session, Xlibre session with kwin compositing disabled (via Alt+Shift+F12): VRAM usage stays the same. Xlibre session with kwin compositing enabled: VRAM usage slowly but surely creeps up. amdgpu_top shows Xorg (or elogind if X is started from tty) hogging up the VRAM, killing the session frees it as expected.

Before that I tried running sessions over prolonged time with modesetting driver and results were largely the same as in previous tests. I'm not even sure what to make of the result, as I've seen the leak happen on fd.o and there are third-party reports of it happening on wayland, too. Whatever the reason is, I suspect Xlibre somehow amplifies the issue. Will try to see if it happens on older gen AMD GPU, sadly don't have access to any other cards.

EDIT: happens on RX 5500 XT and RX 460 as well (tested on 7600 XT before), so either it's not a generational bug or those two bugs are unrelated.

mintplague avatar Nov 27 '25 18:11 mintplague

https://github.com/X11Libre/xserver/issues/687#issuecomment-3447775911 suggested to build xlibre with -Db_sanitize=leak to see where could the leak be. I've stresstested it until about 3 gigs of VRAM is filled, sadly it isn't getting detected. Disabling glamoregl module brought no change.

Image

Pardon the screenshot, startx won't let me launch a session over ssh or wrapped in nohup so had to use fbgrab.

mintplague avatar Nov 27 '25 21:11 mintplague