open-gpu-kernel-modules icon indicating copy to clipboard operation
open-gpu-kernel-modules copied to clipboard

[MAJOR] KDE Plasma Wayland & X11 poor performance & frame drops when opening apps

Open kodatarule opened this issue 2 years ago • 47 comments

NVIDIA Open GPU Kernel Modules Version

535.86.05

Please confirm this issue does not happen with the proprietary driver (of the same version). This issue tracker is only for bugs specific to the open kernel driver.

  • [x] I confirm that this does not happen with the proprietary driver package.

Operating System and Version

EndeavourOS Linux

Kernel Release

6.4.6-zen

Please confirm you are running a stable release kernel (e.g. not a -rc). We do not accept bug reports for unreleased kernels.

  • [x] I am running on a stable kernel release.

Hardware: GPU

RTX 3090

Describe the bug

When opening apps or just trying to screen record, in general anything which demands more from the GPU it starts losing frames, hitches and lags. This doesn't occur on the proprietary driver

To Reproduce

Load into KDE Plasma wayland and open any app(dolphin,browser, etc)

Bug Incidence

Always

nvidia-bug-report.log.gz

nvidia-bug-report.log.gz

More Info

No response

kodatarule avatar Jul 26 '23 16:07 kodatarule

Just wanted to make slight update, this also occurs on X11 as well

kodatarule avatar Sep 13 '23 19:09 kodatarule

With the news of driver 560 defaulting to the open kernel modules, I decided to give this a second try and this issue is still present on both X11 and wayland.

Operating System: EndeavourOS KDE Plasma Version: 6.0.4 KDE Frameworks Version: 6.1.0 Qt Version: 6.7.0 Kernel Version: 6.8.9-zen1-1-zen (64-bit) Graphics Platform: Wayland Processors: 16 × AMD Ryzen 7 5800X3D 8-Core Processor Memory: 31,2 GiB of RAM Graphics Processor: NVIDIA GeForce RTX 3090/PCIe/SSE2

kodatarule avatar May 11 '24 08:05 kodatarule

Just to update for people that would come here:

EDIT: The solution was to add NVreg_EnableGpuFirmware=0 to the kernel load and all issues were fixed!

kodatarule avatar May 21 '24 14:05 kodatarule

Just to update for people that would come here:

EDIT: The solution was to add NVreg_EnableGpuFirmware=0 to the kernel load and all issues were fixed!

This won't work with the open modules, only the closed ones, since the open modules require gsp. Right?

edisionnano avatar May 21 '24 15:05 edisionnano

I believe so that would be the case, this seems to only work on proprietary.

kodatarule avatar May 21 '24 19:05 kodatarule

It would be worth retesting this case with the new 555.42.02 driver: https://www.nvidia.com/Download/driverResults.aspx/224751/en-us/

We made several improvements to graphics performance that will help both the proprietary kernel modules with NVreg_EnableGpuFirmware=1, and the open kernel modules.

aritger avatar May 22 '24 09:05 aritger

Tracked internally as bug 4662986.

mtijanic avatar May 22 '24 09:05 mtijanic

Hi @kodatarule , can I trouble you for two experiments? With the 555.42 driver - Proprietary*, but without NVreg_EnableGpuFirmware=0 (or set it to 1), please try:

(1) Disabling MangoHUD and any other background profiling apps you might have and see if it gets any better. (2) Wait until you see the issue, and then run nvidia-bug-report.sh as soon as you can.

This is so we can get the bug report snapshot soon after a bad state and we know where to look at it, timescale-wise.

Thanks in advance!

* Open is also fine, but then please run with NVreg_RmMsg=":" and also run dmesg -w > dmesg.txt on the side and attach that file too.

mtijanic avatar May 22 '24 14:05 mtijanic

Hi, I did try with proprietary beta 555.42 with the GPU firmware enabled and have generated a log. Just to update I tried both with mangohud on/off globally which didn't make any change at all. nvidia-bug-report.log.gz

kodatarule avatar May 23 '24 04:05 kodatarule

To help better isolate this, looking more carefully at your xorg.conf:

Option         "nvidiaXineramaInfoOrder" "DFP-3"
Option         "metamodes" "DP-2: 2560x1440_165 +0+0 {ForceFullCompositionPipeline=On}, DP-0: 2560x1440_165 +2560+0 {ForceFullCompositionPipeline=On}"
Option         "UseNvKmsCompositionPipeline" "false"

Do you see the same performance problems: (a) if you remove the UseNvKmsCompositionPipeline option (b) if you remove the {ForceFullCompositionPipeline=On} parts (c) if you use slightly lower refresh rates? (I assume 2560x1440_165 is running at 165 Hz)

aritger avatar May 25 '24 01:05 aritger

Hello, The option for UseNvKmsCompositionPipeline if removed would create even bigger stutters. ForceComp/ForceFullComp On or Off didn't make any difference, on a side note these options make 0 difference on wayland which is affected either way. Changing refresh rates had no impact on this, it feels like it does weird clocks with GSP Firmware(driver 555 proprietary and all open source drivers prior to this). I'm not sure what could be causing this problem, but I also noticed a lot of reports of people on the nvidia forums as well that GSP does trigger this same behavior for their systems.

kodatarule avatar May 26 '24 06:05 kodatarule

Update: We've found two possible causes of stutter. Or rather, we found two issues that definitely cause stutter on some configurations, but we still don't have a good idea of how widespread either of them is.

I have published patches that eliminate one and log the other here: https://github.com/NVIDIA/open-gpu-kernel-modules/pull/658

I'd love it if folks that are experiencing these issues would give it a try and report back. Getting a good idea of the impact would help us prioritize getting these in. Many thanks in advance!

mtijanic avatar Jun 06 '24 09:06 mtijanic

@mtijanic Thanks for the patchset! I have patched it for nvidia-open-dkms and pushed it to the testing repository on CachyOS. Users got notified for testing this. Sadly, I can not reproduce this on 40xx GPU's.

ptr1337 avatar Jun 07 '24 08:06 ptr1337

Actually, sometimes when doing a screenshot with spectacle, im seeing some little fps drops on the patched nvidia-open-dkms module. This was not present on the closed one, but im not sure if this is fully related.

nvidia-bug-report.log.gz

ptr1337 avatar Jun 08 '24 10:06 ptr1337

Update: We've found two possible causes of stutter. Or rather, we found two issues that definitely cause stutter on some configurations, but we still don't have a good idea of how widespread either of them is.

I have published patches that eliminate one and log the other here: #658

I'd love it if folks that are experiencing these issues would give it a try and report back. Getting a good idea of the impact would help us prioritize getting these in. Many thanks in advance!

What would be the proper process of building and installing this patchset? I'm facing these issues on the open-beta-dkms and I'd like to help troubleshoot with my logs

Virkkunen avatar Jun 11 '24 09:06 Virkkunen

What would be the proper process of building and installing this patchset? I'm facing these issues on the open-beta-dkms and I'd like to help troubleshoot with my logs

First, make sure you have regular 555.52.04 driver installed in whatever way you do it normally (distro package, .run file, etc). Then, clone my branch with;

 git clone --single-branch --branch 555-testing-patches https://github.com/mtijanic/open-gpu-kernel-modules.git 555-testing

Then, build it:

 cd 555-testing && make -j16

If successful, it will produce a file kernel-open/nvidia.ko (and many others not relevant here). Check if it exists. Now, you just need to switch to using this instead of your installed nvidia.ko. To find out where it is, you can run

$ modinfo nvidia | grep filename
filename:       /lib/modules/5.15.0-105-generic/kernel/drivers/video/nvidia.ko

Easiest would be to just backup the original file, and replace it with the newly built one:

cd /lib/modules/5.15.0-105-generic/kernel/drivers/video/
sudo mv nvidia.ko nvidia.ko.backup
sudo cp /path/to/555-testing/kernel-open/nvidia.ko .

Or use symlinks.

You'll need to reload the driver for the change to take effect. A system reboot would do it, but also killing X / your DE and then rmmod would work too. For example:

sudo service lightdm stop # or gdm, etc
sudo rmmod nvidia_uvm nvidia_vgpu_vfio nvidia_drm nvidia_modeset nvidia
sudo service lightdm start

To revert, just restore the original backed up file.

mtijanic avatar Jun 11 '24 09:06 mtijanic

@Virkkunen If you are on archlinux, you can also use following PKGBUILD: https://github.com/CachyOS/CachyOS-PKGBUILDS/blob/master/nvidia/nvidia-open-dkms/PKGBUILD

@mtijanic Ive tested this now for around one week and still having here and there stutters, mainly at screenshots or minimizing windows.

ptr1337 avatar Jun 11 '24 11:06 ptr1337

Using @ptr1337 PKGBUILD (on endeavour) I was able to install this patch. So far it seems that the stutter while opening, closing and minimising apps, and screen recording (with spectacle) is gone.

However, when moving the cursor I can notice some stutters. Moving quickly in a circle it becomes more apparent, with visible gaps in the circle, like it's skipping some positions. I tried to record a slow motion video of this but it's quite a finnicky thing to visualise in a recording.

https://github.com/NVIDIA/open-gpu-kernel-modules/assets/9111925/66c77e25-5154-4655-8885-bf89157e0757

nvidia-bug-report.log.gz

Virkkunen avatar Jun 11 '24 17:06 Virkkunen

Ok built the open modules with the patches and so far it seems the stutter issues have been fixed! i reported this problem on the nvidia forums for closed modules before. First time using open ones. RTX 3080 555.52.04, 6.9.3-cachyos kernel, Arch Linux, MATE Desktop, X11

edit: OK theres still very minor input related (mouse) stutter now like few periodic frametime spikes..which doesn't happen with closed modules and gsp disabled.

overall seems to be huge improvement, but not yet ideal.

xpander69 avatar Jun 17 '24 08:06 xpander69

After testing out the open modules with the patches, the situation has improved somewhat, but the hitches when opening apps or moving the cursor are still present. Attached is a bug report.

nvidia-bug-report.log.gz

kodatarule avatar Jun 17 '24 09:06 kodatarule

@mtijanic I have just updated to the stable 555.58 driver (closed one), enabled the GSP Firmware but these stutters are still present.

Ive noticed, the PR from you got merged. Here a video, where its mainly visible on doing a screenshot with spectacle.

https://github.com/NVIDIA/open-gpu-kernel-modules/assets/70081076/7e33f71c-4b6c-4def-b020-85644d96646b nvidia-bug-report.log.gz

ptr1337 avatar Jun 27 '24 16:06 ptr1337

Follow-up on this:

Update: We've found two possible causes of stutter. Or rather, we found two issues that definitely cause stutter on some configurations, but we still don't have a good idea of how widespread either of them is.

In 555.58.02 (but not 555.58 from last week) we fixed the bigger of the two causes. Particularly those using kwin should give this a try and report back. 555.58.02 does not include https://github.com/NVIDIA/open-gpu-kernel-modules/pull/658/commits/674c009526b4a47c5dece5a7a2facc7e637bead7 which fixes a different, less frequent cause. You can still apply this commit manually if using the Open modules, and it will be included in 560.xx.

Please test and report back! :heart:

mtijanic avatar Jul 01 '24 15:07 mtijanic

@mtijanic Desktop generally runs fine, the only problem, which im still seeing (with https://github.com/NVIDIA/open-gpu-kernel-modules/commit/674c009526b4a47c5dece5a7a2facc7e637bead7 and also without) that spectacle is sometimes "laggy" and just jumps, like you see above.

I made you a fresh video and nvidia-bugreport.sh, see below.

https://github.com/NVIDIA/open-gpu-kernel-modules/assets/70081076/da9d2f81-3381-4724-95eb-5a07b37b17ed

nvidia-bug-report.log.gz

Edit:

I will test further with the closed source driver + GSP enabled.

ptr1337 avatar Jul 01 '24 15:07 ptr1337

I will test further with the closed source driver + GSP enabled.

Please! Closed source and GSP ON vs OFF will give us the best info to triage further.

Thanks a ton, for all the reports you've sent in so far! We might not get a chance to meaningfully reply to them all, but we do really appreciate it.

mtijanic avatar Jul 01 '24 16:07 mtijanic

I will test further with the closed source driver + GSP enabled.

Please! Closed source and GSP ON vs OFF will give us the best info to triage further.

Thanks a ton, for all the reports you've sent in so far! We might not get a chance to meaningfully reply to them all, but we do really appreciate it.

Retsted with the closed source driver with GSP on and off. The issue also appears when I have the GSP Firmware enabled.

Here are comparison:

GSP ON:

https://github.com/NVIDIA/open-gpu-kernel-modules/assets/70081076/a1be2c07-2b24-48c1-b8db-fa9214f68f9e

nvidia-bug-report.log.gz

GSP Off:

https://github.com/NVIDIA/open-gpu-kernel-modules/assets/70081076/7d95163f-d2a5-41f5-8c4d-16d8bdb997a8

nvidia-bug-report.log.gz

Edit: It definitly improved compared to without the patches, but mainly at spectacle I still see these hiccups.

ptr1337 avatar Jul 01 '24 18:07 ptr1337

With 555.58.02 it has definitely improved a lot, however I still notice a few hiccups here and there.

nvidia-bug-report.log.gz

kodatarule avatar Jul 02 '24 07:07 kodatarule

I just tested with 555.58.02 with GSP off and on and I am still seeing weird judders and hitches simply dragging KDE's Dolphin file manager around on the desktop whenever GSP is enabled. When it is off, the window motion is very smooth.

The issue seems to come and go. With GSP, the first few window moves will be smooth, but continuously moving the window around will cause hitching. Without GSP, it is smooth the entire time.

urbenlegend avatar Jul 03 '24 17:07 urbenlegend

Where exactly do I add NVreg_EnableGpuFirmware=0 to the kernel load on arch linux?

omnigenous avatar Jul 09 '24 21:07 omnigenous

Add nvidia.NVreg_EnableGpuFirmware=0 to the variable GRUB_CMDLINE_LINUX_DEFAULT in the /etc/default/grub file, and run grub-mkconfig -o /boot/grub/grub.cfg

MishaProductions avatar Jul 09 '24 21:07 MishaProductions

@omnigenous In addition to what MishaProductions said, make sure to prepend the module name to that option, so like nvidia.NVreg_EnableGpuFirmware=0

urbenlegend avatar Jul 09 '24 21:07 urbenlegend