open-gpu-kernel-modules icon indicating copy to clipboard operation
open-gpu-kernel-modules copied to clipboard

nvidia-powerd.service needs a reboot to change the power limit state on AMD lenovo gaming laptops

Open ghost opened this issue 2 years ago • 36 comments

NVIDIA Open GPU Kernel Modules Version

530.41.03

Does this happen with the proprietary driver (of the same version) as well?

Yes

Operating System and Version

Fedora release 37 (Thirty Seven)

Kernel Release

Linux DESKTOP-MPKTHBJ 6.2.10-200.fc37.x86_64 #1 SMP PREEMPT_DYNAMIC Thu Apr 6 23:30:41 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

Hardware: GPU

GPU 0: NVIDIA GeForce RTX 3050 Laptop GPU (UUID: GPU-b2f720a1-2b4e-9b17-4383-76f9361248a2)

Describe the bug

nvidia-powerd.service needs a full system reboot to change the power limit of the GPU from 64-66W in the laptop's balanced mode to 85W that the laptops' performance mode should use

If i press fn+Q and change to performance mode without restarting the laptop will still use 64W After a system restart the power limit changes to 85W but if i change back to balenced/power saver mode the power limit stays at 85W which is very bad for the GPU temp and may cause hardware damage

The problem here is have to reboot each time you need to change the power limit while on Windows it changes dynamically just by pressing FN + Q

To Reproduce

Change your lenovo legion/ideapad laptop to performance mode from balenced without a system retsrat and check the GPU power draw

Bug Incidence

Always

nvidia-bug-report.log.gz

nvidia-bug-report.log.gz

More Info

Laptop specs:

Lenovo Ideapad Gaming 3 15ACH6 AMD Ryzen 5 5600H
RTX 3050 Mobile 85W 16 GB RAM & 1 TB SSD M.2

ghost avatar Apr 15 '23 15:04 ghost

I would guess your problem is the same or similar to my problem https://github.com/NVIDIA/open-gpu-kernel-modules/issues/491 can you check that like if the driver even realize the powersource change most likely internally it's the same bug lool

bohanubis avatar Apr 15 '23 16:04 bohanubis

I would guess your problem is the same or similar to my problem #491 can you check that like if the driver even realize the powersource change most likely internally it's the same bug lool

it says 0 even when the laptop draws 85W

ghost avatar Apr 15 '23 16:04 ghost

@kleidiss btw 0 is plugged 1 is unplugged just watch "nvidia-settings -q GPUPowerSource" and plug your laptop I guess give it 10 seconds to make sure unplug give it another 10 seconds if there is no switch in the open driver you basically have the same problem as me

bohanubis avatar Apr 15 '23 16:04 bohanubis

@kleidiss btw 0 is plugged 1 is unplugged just watch "nvidia-settings -q GPUPowerSource" and plug your laptop I guess give it 10 seconds to make sure unplug give it another 10 seconds if there is no switch in the open driver you basically have the same problem as me

Ok so i unplugged it went to 1 and plugged back in it went to 0 again

ghost avatar Apr 15 '23 16:04 ghost

@kleidiss you used xorg for that test right

  • yeah there is a difference cause the test I told you to do actually works under wayland for some reason
  • if you are using gnome or kde I guess you can switch in the login manager

bohanubis avatar Apr 15 '23 17:04 bohanubis

[update] since you game me the idea to just the power usage without unplugging anything fps in minecraft will be 120 with shaders and power draw 50 and the mhz 2000 after unplugging fps to 10 and power draw to 15w ( why is that the case for gods sake just give to the gpu 30w at least ) and the mhz to 300 and sometimes 340 and the mhz and power draw will stay exactly the same even after replugging the laptop

bohanubis avatar Apr 15 '23 17:04 bohanubis

* yeah there is a difference cause the test I told you to do actually works under wayland for some reason

Oh i use wayland on fedora Yeah switching to Xorg i see now but i dont think is related to my issue because i also tested the proprietary drivers and there was no such issue there even on Xorg but my specific issue also happens on proprietary drivers

ghost avatar Apr 15 '23 17:04 ghost

[update] since you game me the idea to just the power usage without unplugging anything fps in minecraft will be 120 with shaders and power draw 50 and the mhz 2000 after unplugging fps to 10 and power draw to 15w ( why is that the case for gods sake just give to the gpu 30w at least ) and the mhz to 300 and sometimes 340 and the mhz and power draw will stay exactly the same even after replugging the laptop

15w lmao , might as well turn off the gpu at that point

ghost avatar Apr 15 '23 17:04 ghost

  • 15w lmao , might as well turn off the gpu at that point exactly

Yeah switching to Xorg i see now

at least the problem is not just for me so yeah I guess your response confirms https://github.com/NVIDIA/open-gpu-kernel-modules/issues/491

my specific issue also happens on proprietary drivers

two issue related to power in two days that's hilarious

bohanubis avatar Apr 15 '23 19:04 bohanubis

two issue related to power in two days that's hilarious

Not to mention that on the Unigine Heaven OGL benchmark i get 80-100 points less on Linux while using the same power draw and clocks as Windows (I checked with mangohud and geforce overlay thing on windows) These crappy drivers are also burning power for no reason

I notice a fps loss on DXVK as well while using these drivers on Linux but thats more acceptable than a native benchmark

ghost avatar Apr 15 '23 23:04 ghost

Correction: You just need to reload the service for it to update the power and clock speed limit

Still annoying having to reload it each time tho

ghost avatar Apr 17 '23 01:04 ghost

@kleidiss Wait there is a service for that What is it Probably that would solve my problem as well

bohanubis avatar Apr 17 '23 06:04 bohanubis

@kleidiss Wait there is a service for that What is it Probably that would solve my problem as well

nvidia-powerd.service It manages dynamic boost meaning it takes power from the CPU when the CPU isn't using it and gives it to the GPU I don't think that is related to battery tho

ghost avatar Apr 17 '23 13:04 ghost

Just reload the service when unplugged and see if the power draw changes If it does then it's confirmed we have basically the same problem

systemctl stop nvidia-powerd.service and than just start it again

ghost avatar Apr 17 '23 18:04 ghost

well it seems that my system doesn't support that in the first place so yeah we don't have the same problem I guess good for you for your temporary solution

bohanubis avatar Apr 17 '23 19:04 bohanubis

Restarting nvidia-powerd service without rebooting causes dgpu unable to go to d3cold state, making power drain even when it's not being used. So it's still not a ideal solution. My Lenovo laptop is 15ARH7, Ryzen 6600H with NVIDIA RTX3050.

EDIT: Need more testing.

saltyming avatar Apr 20 '23 11:04 saltyming

Hi there. I have the same Laptop and I have problems with suspend. Steps to reproduce the problem:

  • Plug the laptop
  • suspend
  • Unplug laptop while it's suspend
  • Resume the laptop, now turn off, and it won't turn off. Also if you plug HDMI and suspend laptop. It won't suspend. It's weird.

P.D: Sorry for my English. I'm learning. And Thank you if you could help me, I am a beginner in Linux world

NiiSV811 avatar Apr 20 '23 17:04 NiiSV811

P.D: Sorry for my English. I'm learning. And Thank you if you could help me, I am a beginner in Linux world

I usually have my laptop plugged in so idk

Make a new issue for this too so more people see it

ghost avatar Apr 20 '23 18:04 ghost

have anything changed in your part for the past 2 weeks for me even though I've switched to open beta drivers no difference

bohanubis avatar May 02 '23 21:05 bohanubis

New drivers same bug

ghost avatar May 30 '23 19:05 ghost

lol I will wait for these drivers to drop in the aur in my side and update my problem as well this repo is pathetic at least tell us that you have putted the bug in your internal bug tracker

bohanubis avatar May 30 '23 20:05 bohanubis

I have a similar problem. On hp vicuts 16-d1xxx laptop with patch (https://lore.kernel.org/platform-driver-x86/[email protected]/), It needs restarting nvidia-powerd.service after changing platform profile to change tgp.

onenowy avatar Jun 04 '23 04:06 onenowy

@onenowy Seems like this issue thread isn't being seen by the NV team Better go explain this bug on this issue instead

https://github.com/NVIDIA/open-gpu-kernel-modules/issues/392

And yes we have the same bug Changing platform profile doesn't do shit unless a restart

ghost avatar Jun 04 '23 12:06 ghost

Dear All, We have already filed a bug 4142071 for this issue and it has been also root caused. We will integrate the fix in future release drivers and shall update accordingly.

amrit1711 avatar Jun 05 '23 09:06 amrit1711

Dear All, Fix is available in latest release driver, please verify and share test results.

amrit1711 avatar Oct 10 '23 11:10 amrit1711

It's working with 3060 mobile on hp victus 16 laptop.

onenowy avatar Oct 10 '23 11:10 onenowy

Dear All, Fix is available in latest release driver, please verify and share test results.

Great! Works for me (Acer AN515-45) with a 3070.

xAlpharax avatar Oct 10 '23 12:10 xAlpharax

The fix is on the driver from 3 weeks ago right?

ghost avatar Oct 10 '23 12:10 ghost

Still not fixed on 535.113.01 for me Switching to performance power profile and GPU is still stuck at 64 watts and not going up to 85 watts like its supposed to

This was tested on closed source driver so maybe the fix is not merged into that yet?

ghost avatar Oct 10 '23 14:10 ghost

@kleidiss Could you please help to share nvidia bug report from repro state. Thanks in advance.

amrit1711 avatar Oct 11 '23 07:10 amrit1711