Nvidia RTX 5070TI | 575.64.03 drivers | Suspend Issue
NVIDIA Open GPU Kernel Modules Version
575.64.03
Please confirm this issue does not happen with the proprietary driver (of the same version). This issue tracker is only for bugs specific to the open kernel driver.
- [x] I confirm that this does not happen with the proprietary driver package.
Operating System and Version
Arch Linux
Kernel Release
6.15.4-zen2-1-zen
Please confirm you are running a stable release kernel (e.g. not a -rc). We do not accept bug reports for unreleased kernels.
- [x] I am running on a stable kernel release.
Hardware: GPU
GPU 0: NVIDIA GeForce RTX 5070 Ti Laptop GPU (UUID: GPU-846a94a0-8570-27c4-9943-057ea9ee7cea)
Describe the bug
After suspending the laptop, an error appears in dmesg indicating that it cannot be suspended and then an exception related to nv_pmops_runtime_suspend is generated.
To Reproduce
Turn on the laptop. Hit suspend. Check the dmesg
Bug Incidence
Sometimes
nvidia-bug-report.log.gz
the bug report is stuckuing, here the image:
I will attach journal output from my system where You can read the logs.
More Info
No response
❯ cat journal.log | grep -i nvrm jul 03 20:02:06 msi-arch kernel: NVRM: loading NVIDIA UNIX Open Kernel Module for x86_64 575.64.03 Release Build (root@) jul 03 20:04:13 msi-arch kernel: NVRM: testIfDsmSubFunctionEnabled: GPS ACPI DSM called before _acpiDsmSupportedFuncCacheInit subfunction = 11. jul 03 20:04:17 msi-arch kernel: NVRM: testIfDsmSubFunctionEnabled: GPS ACPI DSM called before _acpiDsmSupportedFuncCacheInit subfunction = 11. jul 03 20:05:27 msi-arch kernel: NVRM: testIfDsmSubFunctionEnabled: GPS ACPI DSM called before _acpiDsmSupportedFuncCacheInit subfunction = 11. jul 03 20:05:29 msi-arch kernel: NVRM: testIfDsmSubFunctionEnabled: GPS ACPI DSM called before _acpiDsmSupportedFuncCacheInit subfunction = 11. jul 03 20:06:05 msi-arch kernel: NVRM: testIfDsmSubFunctionEnabled: GPS ACPI DSM called before _acpiDsmSupportedFuncCacheInit subfunction = 11. jul 03 20:06:08 msi-arch kernel: NVRM: testIfDsmSubFunctionEnabled: GPS ACPI DSM called before _acpiDsmSupportedFuncCacheInit subfunction = 11. jul 03 20:27:20 msi-arch kernel: NVRM: testIfDsmSubFunctionEnabled: GPS ACPI DSM called before _acpiDsmSupportedFuncCacheInit subfunction = 11. jul 04 07:37:42 msi-arch kernel: NVRM: testIfDsmSubFunctionEnabled: GPS ACPI DSM called before _acpiDsmSupportedFuncCacheInit subfunction = 11. jul 04 11:54:53 msi-arch kernel: NVRM: testIfDsmSubFunctionEnabled: GPS ACPI DSM called before _acpiDsmSupportedFuncCacheInit subfunction = 11. jul 04 11:54:55 msi-arch kernel: NVRM: testIfDsmSubFunctionEnabled: GPS ACPI DSM called before _acpiDsmSupportedFuncCacheInit subfunction = 11. jul 04 11:54:55 msi-arch kernel: NVRM: testIfDsmSubFunctionEnabled: GPS ACPI DSM called before _acpiDsmSupportedFuncCacheInit subfunction = 11. jul 04 11:54:56 msi-arch kernel: NVRM: testIfDsmSubFunctionEnabled: GPS ACPI DSM called before _acpiDsmSupportedFuncCacheInit subfunction = 11. jul 04 11:55:00 msi-arch kernel: NVRM: testIfDsmSubFunctionEnabled: GPS ACPI DSM called before _acpiDsmSupportedFuncCacheInit subfunction = 11. jul 04 11:55:03 msi-arch kernel: NVRM: testIfDsmSubFunctionEnabled: GPS ACPI DSM called before _acpiDsmSupportedFuncCacheInit subfunction = 11. jul 04 11:55:11 msi-arch kernel: NVRM: nvCheckOkFailedNoLog: Check failed: Out of memory [NV_ERR_NO_MEMORY] (0x00000051) returned from _memdescAllocInternal(pMemDesc) @ mem_desc.c:1353 jul 04 11:55:11 msi-arch kernel: NVRM: nvAssertOkFailedNoLog: Assertion failed: Out of memory [NV_ERR_NO_MEMORY] (0x00000051) returned from status @ kernel_gsp.c:4615 jul 04 11:55:11 msi-arch kernel: NVRM: nvAssertOkFailedNoLog: Assertion failed: Out of memory [NV_ERR_NO_MEMORY] (0x00000051) returned from kgspCreateRadix3(pGpu, pKernelGsp, &pKernelGsp->pSRRadix3Descriptor, NULL, NULL, gspfwSRMeta.sizeOfSuspendResumeData) @ kernel_gsp_tu102.c:1303 jul 04 11:55:28 msi-arch kernel: NVRM: Error in service of callback jul 04 11:59:54 msi-arch kernel: NVRM: GPU 0000:01:00.0: PreserveVideoMemoryAllocations module parameter is set. System Power Management attempted without driver procfs suspend interface. Please refer to the 'Configuring Power Management Support' section in the driver README. jul 04 11:59:55 msi-arch kernel: NVRM: GPU 0000:01:00.0: PreserveVideoMemoryAllocations module parameter is set. System Power Management attempted without driver procfs suspend interface. Please refer to the 'Configuring Power Management Support' section in the driver README.
Happened newly, I reach to capture the bug report and this was attached in the main thread.
see https://github.com/NVIDIA/open-gpu-kernel-modules/issues/887#issuecomment-3054482747
Got this as well. I think one of the newer linux kernels last week broke suspend (speaking of Arch, beginning 6.15.5 I believe); or the linux-firmware that came around the same time. As of 6.15.6 the issue still persists.
The laptop now suspends for ~10s before waking itself up again. Hibernate works okay.
[ 751.267398] PM: suspend entry (s2idle)
[ 751.653756] Filesystems sync: 0.386 seconds
[ 751.749605] Bluetooth: hci0: Invalid exception type 04
[ 751.754904] Freezing user space processes
[ 751.756938] Freezing user space processes completed (elapsed 0.002 seconds)
[ 751.756944] OOM killer disabled.
[ 751.756945] Freezing remaining freezable tasks
[ 751.757963] Freezing remaining freezable tasks completed (elapsed 0.001 seconds)
[ 751.757970] printk: Suspending console(s) (use no_console_suspend to debug)
[ 751.985523] ACPI: EC: interrupt blocked
[ 764.820176] ACPI: EC: interrupt unblocked
[ 764.838179] ACPI Warning: Time parameter 250 us > 100 us violating ACPI spec, please fix the firmware. (20240827/exsystem-142)
[ 765.225486] nvidia 0000:02:00.0: Enabling HDA controller
[ 765.360392] pci 0000:00:08.0: Setting to D3hot
[ 765.377666] spd5118 5-0051: Failed to write b = 0: -6
[ 765.377672] spd5118 5-0051: PM: dpm_run_callback(): spd5118_resume [spd5118] returns -6
[ 765.377685] spd5118 5-0051: PM: failed to resume async: error -6
[ 765.378005] spd5118 5-0053: Failed to write b = 0: -6
[ 765.378012] spd5118 5-0053: PM: dpm_run_callback(): spd5118_resume [spd5118] returns -6
[ 765.378021] spd5118 5-0053: PM: failed to resume async: error -6
[ 765.386423] nvme nvme1: D3 entry latency set to 10 seconds
[ 765.391382] nvme nvme0: D3 entry latency set to 10 seconds
[ 765.399188] nvme nvme0: 24/0/0 default/read/poll queues
[ 765.438618] Bluetooth: hci0: Invalid exception type 04
[ 765.483940] nvme nvme1: 15/0/0 default/read/poll queues
[ 765.525282] OOM killer enabled.
[ 765.525284] Restarting tasks ... done.
[ 765.528107] random: crng reseeded on system resumption
[ 765.646798] PM: suspend exit
(Didn't find much in the logs though)
Additionally it seems DP-out (via USBC / TB5 on my laptop) was also broken by the new kernel. Right now only HDMI-out works.
Hi All, Thanks for reporting issue, could you please apply below patch and see if it fixes the issue. https://github.com/NVIDIA/open-gpu-kernel-modules/commit/c7e72135da83ff027755b4a61a3ff09a32fe00c3
Hi All, Thanks for reporting issue, could you please apply below patch and see if it fixes the issue. c7e7213
I don't know how try that patch but after 575.64.05 I think the bug has been solved because there is no errors neveremore.
All clear for me too - I believe an intermediate kernel update might also have played a part.
@sk0rabu The patch @amrit1711 proposed seemed to make it into a release before .05 and likely fixed it, either that or the patch isn't applying fully anymore on .05 because it gave an error saying there's nothing to patch(?!) when I tried applying it on Gentoo.
@amrit1711 Well, after some time I can get an error after suspend newly:
This happend when I suspended the laptop while a game was using the GPU. After that RTD3 was not working nevermore and the GPU stay on until reboot or power off the laptop.
I will attach the nvidia bug report zip.
[ 9.098158] NVRM: loading NVIDIA UNIX Open Kernel Module for x86_64 575.64.05 Release Build (root@) [ 11.078195] NVRM: testIfDsmSubFunctionEnabled: GPS ACPI DSM called before _acpiDsmSupportedFuncCacheInit subfunction = 11. [ 13.258310] NVRM: testIfDsmSubFunctionEnabled: GPS ACPI DSM called before _acpiDsmSupportedFuncCacheInit subfunction = 11. [ 8885.158530] NVRM: _kdispHandleAwakenChnMask: seeing an awaken in channel 0 without an associated awaken event [ 8894.997915] NVRM: Error in service of callback
I have the same issue on 575.64.05 - same behavior as http://github.com/NVIDIA/open-gpu-kernel-modules/issues/896#issuecomment-3064558041