open-gpu-kernel-modules icon indicating copy to clipboard operation
open-gpu-kernel-modules copied to clipboard

NVIDIA crashes after resume (from sleep or hibernation)

Open jotkauser opened this issue 6 months ago • 11 comments

NVIDIA Open GPU Kernel Modules Version

575.64-2 (Latest one from Arch package: nvidia-open-lts)

Please confirm this issue does not happen with the proprietary driver (of the same version). This issue tracker is only for bugs specific to the open kernel driver.

  • [ ] I confirm that this does not happen with the proprietary driver package.

Operating System and Version

Description: Arch Linux

Kernel Release

Linux asus-main 6.12.34-1-lts #1 SMP PREEMPT_DYNAMIC Thu, 19 Jun 2025 15:05:14 +0000 x86_64 GNU/Linux

Please confirm you are running a stable release kernel (e.g. not a -rc). We do not accept bug reports for unreleased kernels.

  • [x] I am running on a stable kernel release.

Hardware: GPU

NVIDIA GeForce RTX 3050 Laptop GPU

Describe the bug

So, i'm using Arch on my ASUS laptop. And sometimes when i sleep or hibernate, after resuming the NVIDIA driver doesn't work anymore. When i check dmesg i see this [ 4331.789906] nvidia 0000:01:00.0: can't suspend (nv_pmops_runtime_suspend [nvidia] returned -5). After a reboot everything is okay. Sometimes this error prevents the kernel from suspend and putting me in a broken state. I noticed this issue sometimes show up, sometimes everything works fine.

To Reproduce

  • Install latest Arch Linux with LTS kernel.
  • Install nvidia-open-lts package
  • Configure Hibernation
  • Try to sleep for some time (there is a chance it won't go to suspend state) (dmesg will show nvidia error)
  • Check if driver works after resume from hibernation (nvidia-smi doesnt return unknown error)

Bug Incidence

Sometimes

nvidia-bug-report.log.gz

nvidia-bug-report.log.gz

More Info

Exact laptop: ASUS ROG Strix G15 (G15IC) I'm using Wayland. I don't know if something would change on Xorg. NVIDIA is still a discrete GPU. Doesn't render image. This problem also occurs on proprietary driver. But also occurs on the Open driver.

jotkauser avatar Jun 22 '25 19:06 jotkauser

Sorry for not telling you my observations.

Before 575.64, I found someone said that RTD3 contains bugs, I believe that it is nvidia's employees who want to fix this issue and introducing new bugs.

As a test, you could try https://download.nvidia.com/XFree86/Linux-x86_64/435.17/README/dynamicpowermanagement.html I use the proposed 80-nvidia-pm.rules, and set options nvidia "NVreg_DynamicPowerManagement=0x01" rather than the proposed options nvidia "NVreg_DynamicPowerManagement=0x02".

At least for me, after this setting, the GPU won't hang anymore.

Neutron3529 avatar Jun 27 '25 16:06 Neutron3529

Hi @jotkauser Thanks for reporting issue, could you please apply below patch and see if it fixes the issue.

https://github.com/NVIDIA/open-gpu-kernel-modules/commit/c7e72135da83ff027755b4a61a3ff09a32fe00c3

amrit1711 avatar Jul 03 '25 06:07 amrit1711

Hi @jotkauser Thanks for reporting issue, could you please apply below patch and see if it fixes the issue.

c7e7213

So should i build the kernel module with this patch?

jotkauser avatar Jul 03 '25 12:07 jotkauser

Yes, you can build the open kernel with that patch.

amrit1711 avatar Jul 03 '25 12:07 amrit1711

Yes, you can build the open kernel with that patch.

I built the patched driver and installed it, also added the udev rule from Neutron's post. I'll use it for a while and check if the issue still persists.

jotkauser avatar Jul 03 '25 13:07 jotkauser

Sorry for not telling you my observations.

Before 575.64, I found someone said that RTD3 contains bugs, I believe that it is nvidia's employees who want to fix this issue and introducing new bugs.

As a test, you could try https://download.nvidia.com/XFree86/Linux-x86_64/435.17/README/dynamicpowermanagement.html I use the proposed 80-nvidia-pm.rules, and set options nvidia "NVreg_DynamicPowerManagement=0x01" rather than the proposed options nvidia "NVreg_DynamicPowerManagement=0x02".

At least for me, after this setting, the GPU won't hang anymore.

Also fixes the issue on my machine, but involves a drawback in which the GPU is not hybrid anymore. It stays powered even in Eco Mode, so it lacks important power saving feature. Sometimes - it's working in its full performance (100W) regardless of set Power Mode. So it's rather a temporary fix, not for use in portable way.

But I can confirm that now I'm able to suspend my machine without encountering the GPU hang or disable issue, which previously persisted until a reboot.

Before applying the fix, the issue also prevented the laptop from shutting down. I had infinite loop of stop jobs for systemd, sddm and ofc nvidia-powerd.

tendan avatar Jul 05 '25 12:07 tendan

Also fixes the issue on my machine, but involves a drawback in which the GPU is not hybrid anymore. It stays powered even in Eco Mode, so it lacks important power saving feature. Sometimes - it's working in its full performance (100W) regardless of set Power Mode. So it's rather a temporary fix, not for use in portable way.

But I can confirm that now I'm able to suspend my machine without GPU hanging or disabling unless rebooting.

Before applying the fix, the issue also prevented the laptop from shutting down. I had infinite loop of stop jobs for systemd, sddm and ofc nvidia-powerd.

I'm using fedora silverblue, so I can't write the rule file into the path which is read-only. I only add the nvidia kernel module .conf file in /etc/modprobe.d/ and so far so good, wish it works.

lumingzh avatar Jul 05 '25 12:07 lumingzh

nvidia-bug-report.log.gz

Driver stopped working without any errors related to NVIDIA in dmesg

jotkauser avatar Jul 07 '25 11:07 jotkauser

nvidia-bug-report.log.gz

Driver stopped working without any errors related to NVIDIA in dmesg

According to my test so far, the rule file is not needed. Add the nvidia.conf kernel module .conf file and change the value to 0x01 is enough to fix suspend issue. However, the NVRM still will show errors like "out of memory", so the bug is not fixed in fact.

lumingzh avatar Jul 09 '25 23:07 lumingzh

Is this patch going to land one of the next releases?

alexmo1997 avatar Jul 24 '25 19:07 alexmo1997

the patch has applied to nvidia's 580 driver, as soon as it lands on rolling release distros i will test the driver and return the result if the bug still persists or not and i advise others to do the same

MAHBOD-85 avatar Aug 11 '25 12:08 MAHBOD-85