Possible VRAM leak on AMD GPUs?
Select the version
25.0.0.7-1, 25.0.0.5
Describe your issue
On startup, Plasma session with X11Libre takes about 500mb of VRAM, however over time VRAM usage increases drastically (with the same set of running applications) and gets to 4GB+ in about 18 hours. I am currently unable to test if this affects sessions other than Plasma.
Steps to reproduce
- Start Plasma 6 session with X11Libre.
- Wait for some time, come back and check VRAM usage.
What did you expect?
VRAM usage to stay more or less consistent over time with the same set of applications running.
Additional Information
I am able to consistently reproduce this on Artix box with RX7600 and Plasma 6. UPD: I've also tested this on XLibre 25.0.0.5 (issue reproduces) and Xorg 1.21.1.18 (issue does not reproduce).
Extra fields
- [x] I have checked the existing issues
- [x] I have read the Contributing Guidelines
- [ ] I'd like to work on this issue
I'll try testing this in a bit (also plasma 6 on artix, with a radeon 6600)
X11Libre takes about 500mb of VRAM
BLOATWARE
X11Libre takes about 500mb of VRAM
BLOATWARE
From what I can see in umr, Xorg's VRAM usage depends on opened applications, so this isn't its baseline VRAM usage. Anyway, I hope this was a joke
I'm experiencing the same issue running Gentoo with latest Trinity Desktop on a Laptop with an Intel HD630 iGPU using glamor. After several hours of wotrk memory consumption breaks the 10GiB threshold. After quitting all rumming Applications there is still 8GiB allocated.
The last release https://github.com/X11Libre/xserver/releases/tag/xlibre-xserver-25.0.0.8 mentions a possible memory leak:
- #565
You may try to upgrade and see if this fix remove your memory leaks.
The last release https://github.com/X11Libre/xserver/releases/tag/xlibre-xserver-25.0.0.8 mentions a possible memory leak:
You may try to upgrade and see if this fix remove your memory leaks.
Tested on 25.0.0.8, and it still reproduces. By the way, to anyone who will test this: to see if the leak persists you can run kquitapp6 plasmashell; kstart plasmashell which causes immediate spike of about 200 mb, as was pointed out in the telegram chat, no need to wait for hours. Interestingly, the spike is there for Xorg Xserver too, but the allocated memory is freed after a very short while.
How do you check VRAM usage? Some special utilities from amd/nvidia/intel? Or it’s about normal RAM, not video memory?
Just closed all my applications and checked my vram usage on my amd card using amdgpu_top. It is about 600mb, with only firefox open its 1377mb. I am using Stumpwm which is a lightweight WM. My uptime is 10 days. I have 24gb of vram so I dont know if that means that vram is used more liberally.
Just tested the git master from Aug. 11. The issue is still there. After a day the total memory comsumption is about 8-9GiB even after I closed all applications. Strangely the command ps aux | awk '{print $6/1024 " MB\t\t" $11}' | sort -n shows X allocating only around 110MiB. After killing X the memory is released. No idea what causes that bloat. I use a HD630 iGPU and I have no idea how to measure the amount of allocated memory to the GPU.
After switching back to Xorg the memory consumtion is stable again.
Can anyone confirm that?
Btw.: I have t two screen setup and disconnect/reconnect the laptop with the docking station several times a day to take it to presentations in the office.
checked my vram usage on my amd card using amdgpu_top
ps aux | awk '{print $6/1024 " MB\t\t" $11}' | sort -n
I want to make sure that we're talking about the same memory. ps prints values for RAM (system memory) vs amdgpu_top printing Video memory usage (physically located in GPU) - is this correct?
Maybe we have two different issues here or I'm just missing something?
I want to make sure that we're talking about the same memory.
psprints values for RAM (system memory) vsamdgpu_topprinting Video memory usage (physically located in GPU) - is this correct?
Yes I was talking about system memory! I have no descrete GPU only an Intel HD6330 and andgpu_top won't work there. And since the iGPU uses system memory and "ps aux" doesn't help it's hard to determine where the memory is wasted.
Freeing Cache/Buffer with "echo 3 > /proc/sys/vm/drop_caches" didn't help eiter.
Maybe we have two different issues here or I'm just missing something?
That's why I've asked if more people ran into that issue. Maybe I should open a new issue?
@nkalkhof yeah, if it's a system RAM leak then I think we should create a separate issue.
@AcolyteI are you checking Video RAM usage using amdgpu_top?
@algrid yes, I used both amdgpu_top to check overall VRAM consumption and umr to check consumption for individual processes, which confirmed that Xorg was the process hogging VRAM, see attached screenshot.
@AcolyteI What video driver are you using for your AMD GPU?
For the record: The issue of @nkalkhof will be handled in #687.
@callmetango AMDGPU with kernel 6.15.8-zen and Mesa 25.1.7, not sure which one is relevant. I haven't installed any xf86 drivers for the GPU (Xorg works fine without them). On more recent kernel versions (mainline artix kernel) GTT memory seems to accumulate leaked pages instead of VRAM.
@AcolyteI
I haven't installed any xf86 drivers for the GPU
Thank you! Then XLibre falls back to the built-in modesetting driver. I labeled the issue accordingly.
@AcolyteI could you please try the initial release
It would be good to find a version without this issue.
@algrid I tested 25.0.0.0 (downgraded both xlibre-xserver and xlibre-xserver-common) from artix archives, issue is still there, although somehow it leaks a bit less memory that newer versions on restarting plasmashell? Not quite sure about it, but anyway, the leak is still there.
Hmm, it's interesting. I wonder if the issue was already there at the point of fork? @AcolyteI could you please test that too when you have a chance?
If it's still there we would need somehow to check what commits are different compared to the Xorg 1.21.1.18 version.
Can confirm the memleak happens even with amdgpu xf86 driver.
Xorg.conf:
Section "ServerFlags"
Option "AutoAddGPU" "off"
Option "Debug" "dmabuf_capable"
EndSection
Section "Device"
Identifier "AMDGPU"
Driver "amdgpu"
BusID "PCI:9:0:0"
Option "TearFree" "true"
Option "VariableRefresh" "true"
Option "AsyncFlipSecondaries" "true"
Option "ShadowFB" "false"
Option "Atomic" "true"
EndSection
Sadly couldn't check point-of-fork since it then just crashes with (EE) Failed to load /usr/lib/xorg/modules/xlibre-25.0/drivers/amdgpu_drv.so: /usr/lib/xorg/modules/xlibre-25.0/drivers/amdgpu_drv.so: undefined symbol: glamor_egl_create_textured_pixmap_from_gbm_bo.
@callmetango AMDGPU with kernel 6.15.8-zen and Mesa 25.1.7, not sure which one is relevant. I haven't installed any xf86 drivers for the GPU (Xorg works fine without them). On more recent kernel versions (mainline artix kernel) GTT memory seems to accumulate leaked pages instead of VRAM.
The problem you are dealing with I believe is due to the mesa version you have. I recall a vram leak problem getting fixed in Mesa 25.2.0 onwards.
Okay, after ~20 days of uptime it appears that VRAM leak happens even on fd.o Xorg 21.1.18, though I think it still took more time to fill up than with XLibre. Notably, restarting kwin with kwin_x11 --replace doesn't result in VRAM freeing up.
Apparently it's a drm module bug as there are reports of it happening even on wayland sessions. Same symptoms, memleak and "non-zero when fini" in dmesg. https://gitlab.freedesktop.org/drm/amd/-/issues/4187
Did a bunch of more testing and found a way to trigger it in a short timespan, that is repeatedly killing and restarting plasmashell (e.g. while true; do kquitapp6 plasmashell; plasmashell&; sleep 5; done) while kwin compositing is enabled.
Artix, kernel 6.17.7-zen1-1-zen, xlibre 25.0.0.15-1, Plasma 6.5.3, kwin-x11-lite 6.5.3-1.1, same config as above. Wayland session, fd.o Xorg 21.1.20-1 session, Xlibre session with kwin compositing disabled (via Alt+Shift+F12): VRAM usage stays the same. Xlibre session with kwin compositing enabled: VRAM usage slowly but surely creeps up. amdgpu_top shows Xorg (or elogind if X is started from tty) hogging up the VRAM, killing the session frees it as expected.
Before that I tried running sessions over prolonged time with modesetting driver and results were largely the same as in previous tests. I'm not even sure what to make of the result, as I've seen the leak happen on fd.o and there are third-party reports of it happening on wayland, too. Whatever the reason is, I suspect Xlibre somehow amplifies the issue. Will try to see if it happens on older gen AMD GPU, sadly don't have access to any other cards.
EDIT: happens on RX 5500 XT and RX 460 as well (tested on 7600 XT before), so either it's not a generational bug or those two bugs are unrelated.
https://github.com/X11Libre/xserver/issues/687#issuecomment-3447775911 suggested to build xlibre with -Db_sanitize=leak to see where could the leak be. I've stresstested it until about 3 gigs of VRAM is filled, sadly it isn't getting detected. Disabling glamoregl module brought no change.
Pardon the screenshot, startx won't let me launch a session over ssh or wrapped in nohup so had to use fbgrab.