open-gpu-kernel-modules icon indicating copy to clipboard operation
open-gpu-kernel-modules copied to clipboard

Suspend not working with driver 570.124.06

Open Caian opened this issue 8 months ago • 2 comments

NVIDIA Open GPU Kernel Modules Version

570.124.06

Please confirm this issue does not happen with the proprietary driver (of the same version). This issue tracker is only for bugs specific to the open kernel driver.

  • [ ] I confirm that this does not happen with the proprietary driver package.

Operating System and Version

Linux Mint 21.3

Kernel Release

5.15.0-138-generic

Please confirm you are running a stable release kernel (e.g. not a -rc). We do not accept bug reports for unreleased kernels.

  • [x] I am running on a stable kernel release.

Hardware: GPU

NVIDIA GeForce RTX 5080 Founders

Describe the bug

Since the latest kernel update suspend stopped working.

dmesg reads:

[  111.929180] NVRM: GPU 0000:06:00.0: PreserveVideoMemoryAllocations module parameter is set. System Power Management attempted without driver procfs suspend interface. Please refer to the 'Configuring Power Management Support' section in the driver README.

NVIDIA unit files on systemd:

systemctl list-unit-files | grep nvidia
nvidia-hibernate.service                                                  masked          enabled
nvidia-persistenced.service                                               enabled         enabled
nvidia-powerd.service                                                     enabled         enabled
nvidia-resume.service                                                     masked          enabled
nvidia-suspend-then-hibernate.service                                     enabled         enabled
nvidia-suspend.service                                                    masked          enabled

To Reproduce

  • Suspend the system using: sudo systemctl suspend
  • System should not suspend
  • Check dmesg

Bug Incidence

Always

nvidia-bug-report.log.gz

nvidia-bug-report.log.gz

More Info

No response

Caian avatar Apr 20 '25 02:04 Caian

This seems to be an older version of Linux Mint?

Could you try on latest one? I also couldn't suspend, using latest kernel and mesa fixes the issue.

francoism90 avatar Apr 20 '25 10:04 francoism90

CR on 570, 575, etc. - Linux, Sway on Wayland, wlroots, on CachyOS Bore Kernel, Zen Kernel, LTS Kernel, Stable Kernel.

ninetailedtori avatar May 11 '25 14:05 ninetailedtori

when you nvidia fix this sleep stuff dpms not working sleep not working... so long!

skarpinis avatar May 31 '25 09:05 skarpinis

[ 111.929180] NVRM: GPU 0000:06:00.0: PreserveVideoMemoryAllocations module parameter is set. System Power Management attempted without driver procfs suspend interface. Please refer to the 'Configuring Power Management Support' section in the driver README.

I'm also affected by this with driver version 575.57.08. Been talking about it over at Fedora Discussion.

Enabling user session suspending also didn't help:

cat /usr/lib/systemd/system/systemd-suspend.service.d/disable_freeze_user_session.conf
[Service]
Environment="SYSTEMD_SLEEP_FREEZE_USER_SESSIONS=true"

With this enabled, my system tries to go to sleep (monitor turns off) but it doesn't actually turn itself off - CPU, GPU and case fans keep spinning and LEDs blinking. It's impossible to recover ("wake") from this state, requiring a hard reboot. I've verified that I have enough RAM / swap / /var/tmp space available for video memory (RTX 2060 6GB). Here's my system's fpaste

TarsiSurdi avatar Jun 01 '25 23:06 TarsiSurdi

Hi All, We have a bug 5344831 filed internally for tracking purpose. Shall keep updated on its status.

amrit1711 avatar Jun 16 '25 09:06 amrit1711

From your bug report log, it looks like the required systemd units are masked:

/usr/bin/systemctl status nvidia-suspend.service nvidia-hibernate.service nvidia-resume.service nvidia-powerd.service nvidia-persistenced.service
○ nvidia-suspend.service
     Loaded: masked (Reason: Unit nvidia-suspend.service is masked.)
     Active: inactive (dead)

○ nvidia-hibernate.service
     Loaded: masked (Reason: Unit nvidia-hibernate.service is masked.)
     Active: inactive (dead)

○ nvidia-resume.service
     Loaded: masked (Reason: Unit nvidia-resume.service is masked.)
     Active: inactive (dead)

Can you please unmask those, try to suspend through systemd again, and then generate another bug report log if it still fails?

aaronp24 avatar Jun 16 '25 17:06 aaronp24

From your bug report log, it looks like the required systemd units are masked:

/usr/bin/systemctl status nvidia-suspend.service nvidia-hibernate.service nvidia-resume.service nvidia-powerd.service nvidia-persistenced.service
○ nvidia-suspend.service
     Loaded: masked (Reason: Unit nvidia-suspend.service is masked.)
     Active: inactive (dead)

○ nvidia-hibernate.service
     Loaded: masked (Reason: Unit nvidia-hibernate.service is masked.)
     Active: inactive (dead)

○ nvidia-resume.service
     Loaded: masked (Reason: Unit nvidia-resume.service is masked.)
     Active: inactive (dead)

Can you please unmask those, try to suspend through systemd again, and then generate another bug report log if it still fails?

Indeed this solved the issue, thanks. What's curious is that I haven´t masked those services myself.

Caian avatar Jun 16 '25 17:06 Caian

I'm glad to hear that worked. How did you install the driver? Is it possible that the Mint packages install them as masked?

aaronp24 avatar Jun 16 '25 17:06 aaronp24

I'm glad to hear that worked. How did you install the driver? Is it possible that the Mint packages install them as masked?

I always install the driver alongside CUDA using the .deb (network):

https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&Distribution=Ubuntu&target_version=22.04&target_type=deb_network

The working version was installed this way and I've been using apt install upgrade to update it ever since. Unfortunately I wouldn't know if a mint package masked those services.

Feel free to inquire any information about my system or any package that you think may cause some issue, also to resolve the issue.

Caian avatar Jun 16 '25 18:06 Caian

Hi, @aaronp24, some update on the mater. I did a clean install of Linux Mint Xia and I did the following steps to reinstall my nvidia stuff:

Install nvidia doca-ofed (using official nvidia online repo) Install nvidia-open and cuda-toolkit-12-9 (using official nvidia online repo) Install nvidia-gds Upgrade the system packages

And now the only unit that shows up from nvidia when doing systemctl list-units | grep nvidia is nvidia-persistenced.service. The other services in my first post, that were initially masked, are no longer present.

My guess is they were deprecated on newer nvidia drivers, so they were masked by newer drivers. The problem is there is no functioning replacement for handling the power states.

Can you confirm this or is it there a bug with the nvidia repo that is not distributing the -suspend -resume -hibernate services with the installation?

Caian avatar Jun 26 '25 10:06 Caian

No, they're still required if you're using NVreg_PreserveVideoMemoryAllocations=1. I'll ask the CUDA folks who package these to take a look.

aaronp24 avatar Jun 27 '25 15:06 aaronp24

It looks like these are provided by the nvidia-kernel-common-* packages. I tried following the CUDA driver install instructions and it's the apt install nvidia-open step that pulls in nvidia-kernel-common-575 that provides it. However, it doesn't enable the services by default. I think that's why they don't show up in systemd list-units. You need systemd list-unit-files | grep nvidia to see them.

You should be able to run

systemctl enable nvidia-suspend nvidia-suspend-then-hibernate nvidia-hibernate nvidia-resume

to set them up.

aaronp24 avatar Jun 27 '25 21:06 aaronp24

It looks like these are provided by the nvidia-kernel-common-* packages. I tried following the CUDA driver install instructions and it's the apt install nvidia-open step that pulls in nvidia-kernel-common-575 that provides it. However, it doesn't enable the services by default. I think that's why they don't show up in systemd list-units. You need systemd list-unit-files | grep nvidia to see them.

You should be able to run

systemctl enable nvidia-suspend nvidia-suspend-then-hibernate nvidia-hibernate nvidia-resume

to set them up.

You are absolutely right, I used list-units instead of list-unit-files this last time. Services enabled and working as expected again. Thank you.

Caian avatar Jun 28 '25 00:06 Caian