open-gpu-kernel-modules icon indicating copy to clipboard operation
open-gpu-kernel-modules copied to clipboard

Low fps on external monitor connected to nvidia hdmi port

Open tm4ig opened this issue 1 year ago • 91 comments

NVIDIA Open GPU Kernel Modules Version

555.42.02

Please confirm this issue does not happen with the proprietary driver (of the same version). This issue tracker is only for bugs specific to the open kernel driver.

  • [x] I confirm that this does not happen with the proprietary driver package.

Operating System and Version

Arch Linux

Kernel Release

6.9.3

Please confirm you are running a stable release kernel (e.g. not a -rc). We do not accept bug reports for unreleased kernels.

  • [X] I am running on a stable kernel release.

Hardware: GPU

AMD Radeon 780M IGPU + NVIDIA GeForce RTX 4060 Laptop DGPU (UUID: GPU-36f796c8-ee38-5be5-08a5-c7d8635be2d6)

Describe the bug

I have asus laptop with AMD Radeon 780M IGPU + NVIDIA GeForce RTX 4060 Mobile MAX-Q DGPU and KDE6 and Wayland session. Laptop monitor connected to AMD GPU, external monitor connected to nvidia GPU (HDMI port). When I run glxgears benchmark test in kde 6 wayland session on 555.42.02 nvidia-open driver or 555.42.02 nvidia proprietary driver without nvidia.NVreg_EnableGpuFirmware=0 kernel option on my external monitor connected to nvidia hdmi port I have low fps framerate equal to half the screen refresh rate (in my case I have only ~37-38 fps when external screen refresh rate 75). This looks like a bug https://bugs.kde.org/show_bug.cgi?id=452219 but it nvidia diriver regression because on nvidia-open 550.xx driver or nvidia proprietary driver 555.42 drver wih nvidia.NVreg_EnableGpuFirmware=0 kernel option I have normal framerate on extenal monitor. I can not use nvidia proprietary driver 550.xx or 555.42 because it causes the kernel to panic https://forums.developer.nvidia.com/t/series-550-freezes-laptop/284772/135 and nvidia can not fix this problem more than 3 monthes. I do not want use nvidia open driver 550.xx because with this driver and external monitor I have very large cpu utilization for kwin_wayland proccess

To Reproduce

  1. Connect external monitor to nvidia hdmi port, wayland, kwin 6 and nvidia-open driver 555.42.02
  2. run glxgears on external monitor

Bug Incidence

Always

nvidia-bug-report.log.gz

nvidia-bug-report.log.gz

More Info

No response

tm4ig avatar Jun 02 '24 15:06 tm4ig

similar problem https://forums.developer.nvidia.com/t/wayland-external-monitor-refresh-rate-issue/290752 But in my case problem with nvidia-open 555 driver (or nvidia 555 closed driver with GSP Firmware). With nvidia-open 550 driver (or nvidia proprietary driver 555 with NVreg_EnableGpuFirmware=0 )I have normal framerate on external monitor

tm4ig avatar Jun 06 '24 07:06 tm4ig

OGL_DEDICATED_HW_STATE_PER_CONTEXT=ENABLE_ROBUST does not help me

tm4ig avatar Jun 07 '24 07:06 tm4ig

With nvidia proprietary linux driver 555, nvidia.NVreg_EnableGpuFirmware=0 kernel option, Wayland and KDE Plasma 6.1 I have low CPU usage (around 5-20% for one core activity for kwin_wayland) and hight frame rate (around 70-75 fps with monitor refresh rate 75 Hz) on external monitor connected to nvidia HDMI port but I have kernels panics as in case https://forums.developer.nvidia.com/t/series-550-freezes-laptop/284772/210

With nvidia open linux driver 555, Wayland and KDE Plasma 6.1 I have hight CPU usage (around 20-80% for one core with activity for kwin_wayland) and low frame rate (around 65-70 fps with monitor refresh rate 75 Hz) on external monitor connected to nvidia HDMI port but I have not kernels panics as in case https://forums.developer.nvidia.com/t/series-550-freezes-laptop/284772/210

With nvidia proprietary linux driver 555, nvidia.NVreg_EnableGpuFirmware=1 kernel option, Wayland and KDE Plasma 6.1 I have hight CPU usage (around 20-80% for one core with activity for kwin_wayland) and low frame rate (around 65-70 fps with monitor refresh rate 75 Hz) on external monitor connected to nvidia HDMI port and I have kernels panics as in case https://forums.developer.nvidia.com/t/series-550-freezes-laptop/284772/210

So on nvidia proprietary driver (full closed mode) I have best performance, but I also have kernel panics On nvidia open driver I have low performance, but have not kernel panics On nvidia proprietary driver with Gpu Firmware enabled (default for 555) I have low performance and I have kernels panics.

Nvidia can not fix kernel panics with hybrid graphics five months

tm4ig avatar Jul 10 '24 10:07 tm4ig

Reverse Prime do get tricky on my machine. It’s a known issue for a long time.

Kimiblock avatar Jul 31 '24 16:07 Kimiblock

Are there any updates to this issue ? I'm having the same problem with an external monitor connected via DisplayPort (USB-C). Specific configuration is a Ryzen 7000 laptop with a RTX 4080, the external display is basically unusable

Enverbalalic avatar Aug 07 '24 19:08 Enverbalalic

Had to use only the dedicated GPU for now. Power consumption is insane.

Kimiblock avatar Sep 09 '24 08:09 Kimiblock

nvidia-bug-report.log.gz Relevant discussions on:

  1. https://gitlab.gnome.org/GNOME/mutter/-/issues/3461
  2. https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/4027#note_2231341
  3. https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/1441#note_2236368
  4. https://gitlab.gnome.org/GNOME/mutter/-/issues/3664
  5. https://gitlab.gnome.org/GNOME/mutter/-/issues/3713

NGStaph avatar Oct 01 '24 12:10 NGStaph

Hi all, we also have a thread in the nvidia forums related to the same issue: https://forums.developer.nvidia.com/t/nvidia-please-get-it-together-with-external-monitors-on-wayland/301684/30

lucasslima avatar Oct 08 '24 09:10 lucasslima

This is being tracked as NV bug 4830125

mtijanic avatar Oct 09 '24 11:10 mtijanic

I have the same issue without using HDMI, using a USB->DP cable

kasvtv avatar Oct 17 '24 10:10 kasvtv

same issue. 75Hz external monitor -> 37FPS

Using KDE Plasma 6.2.1.1 Wayland, NVIDIA prop. 560.35.03-17, NVIDIA GTX 1050Ti

Using nvidia.NVreg_EnableGpuFirmware=0 or OGL_DEDICATED_HW_STATE_PER_CONTEXT=ENABLE_ROBUST doesn't help

moiSentineL avatar Oct 28 '24 12:10 moiSentineL

i use usb4 displayport/thunderbolt and, funnily enough, hit the advertised target refresh rates on both monitors in KDE plasma, but not on Gnome.

NGStaph avatar Oct 28 '24 14:10 NGStaph

This is being tracked as NV bug 4830125

Good day! Is this an internal bug tracker or can we get some updates on this publicly?

virusapex avatar Nov 25 '24 09:11 virusapex

Hey there! Sorry, the NV bug is private, but we can provide public updates here. We have a machine with local repro and are actively working on it, but we don't have a root cause yet. The issue seems to be related to the power savings feature of the GSP or one of the display-specific components. Going to a lower (more power) pstate makes the issue go away.

That's not a solution though, and we're working on understanding the exact cause and how to fix it without consuming excess power. Will update here when we have more to share. And if the fix involves only kernel-side changes, we can post the patches as well.

Thanks for the patience!

mtijanic avatar Nov 25 '24 11:11 mtijanic

Hey there! Sorry, the NV bug is private, but we can provide public updates here. We have a machine with local repro and are actively working on it, but we don't have a root cause yet. The issue seems to be related to the power savings feature of the GSP or one of the display-specific components. Going to a lower (more power) pstate makes the issue go away.

That's not a solution though, and we're working on understanding the exact cause and how to fix it without consuming excess power. Will update here when we have more to share. And if the fix involves only kernel-side changes, we can post the patches as well.

Thanks for the patience!

I find this very unlikely. I'm using the closed source drivers with the GSP disabled so I don't think it's related to that, while using the open module does makes the framerate on the external monitor more jittery. Setting the clocks to max also makes no difference, and I can't find how to set the power limit using nvidia-settings in Wayland/nvidia-smi.

lucasslima avatar Nov 26 '24 00:11 lucasslima

I find this very unlikely. I'm using the closed source drivers with the GSP disabled so I don't think it's related to that, while using the open module does makes the framerate on the external monitor more jittery.

Hi there. Am I understanding correctly that you are also seeing the "half-FPS on external monitor" issue with GSP disabled too? That doesn't match our experiments.

While debugging this we did find some causes of jitter where individual frames would take longer, but that wasn't the core issue. Eventually we got to the point where, on GSP only, running with <=P4 pstate a monitor runs at 60.0fps, and with >=P5 at 30.000fps. This is the issue we are debugging and that is tracked here and in NV bug 4830125.

Setting the clocks to max also makes no difference, and I can't find how to set the power limit using nvidia-settings in Wayland/nvidia-smi.

You can poll your pstate with

nvidia-smi --query-gpu="pstate" --format=csv --loop-ms=1000

and any of the following should cause it to change:

  • Running a graphics intensive app, such as __GL_SYNC_TO_VBLANK=0 glxgears (disabling vsync makes it render at max fps and warms up the GPU) should set it to P0
  • Running any CUDA app, such as mpv --hwdec=nvdec-copy video.mp4 will set it to P2
  • This little app should set it to P0: https://gist.github.com/mtijanic/9c129900bfba774b39914ad11b0041f6

mtijanic avatar Nov 26 '24 09:11 mtijanic

Thank you for the information the information, it helped on getting more details from this.

I've tried to set the GPU state using what you've mentioned, I've got some interesting results. When running

__GL_SYNC_TO_VBLANK=0 prime-run glxgears

The power state indeed goes to P0 and the framerate on the desktop improves. Whowever, if I switch to a open Firefox window open in https://testufo.com/, the framerate goes back to half the refresh rate regardless of the GPU power state.

I've made a small recording showing what happens: https://youtu.be/FY-LxShijdk

lucasslima avatar Nov 27 '24 12:11 lucasslima

@lucasslima What happen when you move the Firefox window to the same monitor as glxgears?

ngoquang2708 avatar Nov 27 '24 12:11 ngoquang2708

@lucasslima thanks for that video. I don't think this is the same issue that we're talking about here. From first glance, it looks like a weird interaction between firefox, kwin and the NVIDIA usermode drivers (related to explicit sync?).

~~Could you please send this video and the output of your nvidia-bug-report.sh to [email protected] and then that will get routed internally to where it needs to be, since this repo doesn't seem to be right place for it. Oh, and please mention the make&model of the external monitor, I don't know if it's caught in the bug report log.~~

EDIT: I see on the forums that NV bug 4824813 was filed for this already and it has the needed info. This bug will be revisited once 4830125 is root caused so we know if it is the same issue or not.

Thanks!

mtijanic avatar Nov 27 '24 13:11 mtijanic

Glad for be helpful.

@lucasslima What happen when you move the Firefox window to the same monitor as glxgears?

Both are running on the same screen, it just happens to be ultra-wide.

lucasslima avatar Nov 27 '24 13:11 lucasslima

Using the program above to force P0 fixes the problem for me, although just like @lucasslima on a KDE wayland session with firefox I also get said UFO problem, but without firefox both monitors run smoothly to my eyes.

Simpuis avatar Nov 28 '24 00:11 Simpuis

I have also experienced similar issues on both proprietary and open NVIDIA modules, but only on high refresh rate monitors - I do experience subtle performance issues on my 1080p60 monitor very occasionally, but ALWAYS on high refresh rate ones - 1440p ultrawide 180hz and 1440p 16/9 144hz.

Both GNOME and KDE have this issue, couldn't test on Hyprland as it straight up doesn't work with NVIDIA.

busybox11 avatar Dec 01 '24 04:12 busybox11

Just want to mention for the devs that version 550.76 of the driver does not have this issue. I found it to be the only driver version that doesn't.

clone-888 avatar Jan 06 '25 05:01 clone-888

@mtijanic Good day! Sorry to bother again, but was there any progress on this?

virusapex avatar Jan 20 '25 04:01 virusapex

Hi @virusapex , sorry, I was out of office over the holidays, for an unexpectedly long period of time. Other people have since taken over this issue; I see progress but it still hasn't been fully closed down. I'll follow up and get back here a bit later.

mtijanic avatar Jan 27 '25 11:01 mtijanic

Hi @virusapex , sorry, I was out of office over the holidays, for an unexpectedly long period of time. Other people have since taken over this issue; I see progress but it still hasn't been fully closed down. I'll follow up and get back here a bit later.

Thank you for notifying us!

virusapex avatar Jan 27 '25 11:01 virusapex

~570.86.16 seems to mostly fix the issue for me, which is to say that performance (with the beta drivers) is almost as good as 565 (closed) with nvidia.NVreg_EnableGpuFirmware=0.~

wishful thinking.

NGStaph avatar Feb 03 '25 10:02 NGStaph

570.86.16 seems to mostly fix the issue for me, which is to say that performance (with the beta drivers) is almost as good as 565 with nvidia.NVreg_EnableGpuFirmware=0.

others still seem to be facing difficulties (https://forums.developer.nvidia.com/t/570-release-feedback/321956/13) but this has not been my case

As far as I know, you can't disable GSP firmware on Open kernel module, so nvidia.NVreg_EnableGpuFirmware=0 can't be used in the current situation. As for 570.86.16, I tried but still got a sufficient lag when moving windows around the screen.

virusapex avatar Feb 05 '25 01:02 virusapex

Agree with @virusapex, low rate lag is not fixed in 570.86

uentity avatar Feb 09 '25 02:02 uentity

Using the program above to force P0 fixes the problem for me, although just like @lucasslima on a KDE wayland session with firefox I also get said UFO problem, but without firefox both monitors run smoothly to my eyes.

It's probably because #743 also plays a role here (and it's more of a userspace issue maybe?)

x0wllaar avatar Feb 19 '25 04:02 x0wllaar