Xid 120: GSP load access page fault during driver init (575.51.02)
NVIDIA Open GPU Kernel Modules Version
575.51.02
Please confirm this issue does not happen with the proprietary driver (of the same version). This issue tracker is only for bugs specific to the open kernel driver.
- [x] I confirm that this does not happen with the proprietary driver package.
Operating System and Version
CachyOS
Kernel Release
6.14.2
Please confirm you are running a stable release kernel (e.g. not a -rc). We do not accept bug reports for unreleased kernels.
- [x] I am running on a stable kernel release.
Hardware: GPU
RTX 5080 mobile Max Q
Describe the bug
Hi,
an user is reporting an issue, when using the 575 Driver, that the nvidia driver did not got loaded. Looking through the logs following is visible:
Apr 19 13:16:41 Kyrios kernel: nvidia-nvlink: Nvlink Core is being initialized, major device number 240
Apr 19 13:16:41 Kyrios kernel: NVRM: loading NVIDIA UNIX Open Kernel Module for x86_64 575.51.02 Release Build (root@Kyrios)
Apr 19 13:16:41 Kyrios kernel: nvidia-modeset: Loading NVIDIA UNIX Open Kernel Mode Setting Driver for x86_64 575.51.02 Release Build (root@Kyrios)
Apr 19 13:16:41 Kyrios kernel: [drm] [nvidia-drm] [GPU ID 0x00000200] Loading driver
Apr 19 13:16:41 Kyrios kernel: NVRM: testIfDsmSubFunctionEnabled: GPS ACPI DSM called before _acpiDsmSupportedFuncCacheInit subfunction = 11.
Apr 19 13:16:41 Kyrios kernel: NVRM: kgspHealthCheck_TU102: ****************************** GSP-CrashCat Report *******************************
Apr 19 13:16:41 Kyrios kernel: NVRM: GPU at PCI:0000:02:00: GPU-751acba2-df95-76f6-5914-58abf0c3cba3
Apr 19 13:16:41 Kyrios kernel: NVRM: Xid (PCI:0000:02:00): 120, GSP task exception: load access page fault (cause:0xd) @ pc:0x140ca4a, partition:4#0, task:3
Apr 19 13:16:41 Kyrios kernel: NVRM: Reported by libos partition:4#5 kernel v3.1 [0] @ ts:2
Apr 19 13:16:41 Kyrios kernel: NVRM: RISC-V CSR State:
Apr 19 13:16:41 Kyrios kernel: NVRM: sstatus:0x0000000200000020 sscratch:0xffffffffa30144d0 sie:0x0000000000000220 sip:0x0000000000000000
Apr 19 13:16:41 Kyrios kernel: NVRM: sepc:0x000000000140ca4a stval:0x0000000000000000 scause:0x000000000000000d
Apr 19 13:16:41 Kyrios kernel: NVRM: RISC-V GPR State:
Apr 19 13:16:41 Kyrios kernel: NVRM: ra:0x000000000140d0f6 sp:0x00000047f240f5b0 gp:0x0000000000000000 tp:0x0000000000000000
Apr 19 13:16:41 Kyrios kernel: NVRM: a0:0x0000000000000000 a1:0x00000047eb220530 a2:0x0000000000000004 a3:0x00000047f2a41000
Apr 19 13:16:41 Kyrios kernel: NVRM: a4:0x0000000000000000 a5:0x0000000000000000 a6:0x0000000000001010 a7:0x0000000000000004
Apr 19 13:16:41 Kyrios kernel: NVRM: s0:0x00000047f240f740 s1:0x00000047eb4442d0 s2:0x0000000000000002 s3:0x00000000017d7c26
Apr 19 13:16:41 Kyrios kernel: NVRM: s4:0x00000000040d36b0 s5:0x00000000001a8000 s6:0x00000047eb3805f0 s7:0x0000000000001500
Apr 19 13:16:41 Kyrios kernel: NVRM: s8:0x00000000040d3bc8 s9:0x0000000000000000 s10:0x0000000000000000 s11:0x00000047eb37e5f0
Apr 19 13:16:41 Kyrios kernel: NVRM: t0:0x0000000000000020 t1:0x0000000000000001 t2:0x0000000000000000 t3:0x0000000000000020
Apr 19 13:16:41 Kyrios kernel: NVRM: t4:0x0000000000000000 t5:0x00000047f240f3c1 t6:0x0000000000000020
Apr 19 13:16:41 Kyrios kernel: NVRM: Stack Trace:
Apr 19 13:16:41 Kyrios kernel: NVRM: 0x000000000140ca4a
Apr 19 13:16:41 Kyrios kernel: NVRM: 0x00000000017d7c26
Apr 19 13:16:41 Kyrios kernel: NVRM: 0x00000000017de386
Apr 19 13:16:41 Kyrios kernel: NVRM: 0x00000000017dfca8
Apr 19 13:16:41 Kyrios kernel: NVRM: 0x00000000017d66b2
Apr 19 13:16:41 Kyrios kernel: NVRM: 0x00000000014164f2
Apr 19 13:16:41 Kyrios kernel: NVRM: 0x0000000001a259ee
Apr 19 13:16:41 Kyrios kernel: NVRM: 0x0000000001a483f8
Apr 19 13:16:41 Kyrios kernel: NVRM: 0x0000000001b8486c
Apr 19 13:16:41 Kyrios kernel: NVRM: 0x0000000001a2a74e
Apr 19 13:16:41 Kyrios kernel: NVRM: Local I/O Register State:
Apr 19 13:16:41 Kyrios kernel: NVRM: 0x01450800:0x00000000 0x01450900:0xbadf202b 0x01450a00:0x00000000 0x01450c00:0x00000000
Apr 19 13:16:41 Kyrios kernel: NVRM: 0x01454a00:0x810400d0 0x01454b00:0x010800d0 0x01454c00:0x00080000 0x01400200:0x00000040
Apr 19 13:16:41 Kyrios kernel: NVRM: ------------[ end crash report ]------------
Apr 19 13:16:41 Kyrios kernel: NVRM: GPU0 GSP RPC buffer contains function 4128 (GSP_POST_NOCAT_RECORD) and data 0x0000000000000005 0x00000000017d7c26.
Apr 19 13:16:41 Kyrios kernel: NVRM: GPU0 RPC history (CPU -> GSP):
Apr 19 13:16:41 Kyrios kernel: NVRM: entry function data0 data1 ts_start ts_end duration actively_polling
Apr 19 13:16:41 Kyrios kernel: NVRM: 0 73 SET_REGISTRY 0x0000000000000000 0x0000000000000000 0x000633274fe976ff 0x0000000000000000 y
Apr 19 13:16:41 Kyrios kernel: NVRM: -1 72 GSP_SET_SYSTEM_INFO 0x0000000000000000 0x0000000000000000 0x000633274fe976fc 0x0000000000000000
Apr 19 13:16:41 Kyrios kernel: NVRM: GPU0 RPC event history (CPU <- GSP):
Apr 19 13:16:41 Kyrios kernel: NVRM: entry function data0 data1 ts_start ts_end duration during_incomplete_rpc
Apr 19 13:16:41 Kyrios kernel: NVRM: 0 4128 GSP_POST_NOCAT_RECORD 0x0000000000000005 0x00000000017d7c26 0x000633274ff1351b 0x000633274ff1351d 2us y
Apr 19 13:16:41 Kyrios kernel: NVRM: kgspRcAndNotifyAllChannels_IMPL: RC all user channels for critical error 120.
Apr 19 13:16:41 Kyrios kernel: NVRM: kgspHealthCheck_TU102: **********************************************************************************
Apr 19 13:16:41 Kyrios kernel: NVRM: nvCheckOkFailedNoLog: Check failed: Reset required [NV_ERR_RESET_REQUIRED] (0x00000062) returned from rpcRecvPoll(pGpu, pRpc, NV_VGPU_MSG_EVENT_GSP_INIT_DONE) @ kernel_gsp.c:4878
Apr 19 13:16:41 Kyrios kernel: NVRM: nvAssertOkFailedNoLog: Assertion failed: Reset required [NV_ERR_RESET_REQUIRED] (0x00000062) returned from kgspWaitForRmInitDone(pGpu, pKernelGsp) @ kernel_gsp_gh100.c:952
Apr 19 13:16:41 Kyrios kernel: NVRM: _kgspBootGspRm: unexpected WPR2 already up, cannot proceed with booting GSP
Apr 19 13:16:41 Kyrios kernel: NVRM: _kgspBootGspRm: (the GPU is likely in a bad state and may need to be reset)
Apr 19 13:16:41 Kyrios kernel: NVRM: RmInitAdapter: Cannot initialize GSP firmware RM
Apr 19 13:16:41 Kyrios kernel: NVRM: iovaspaceDestruct_IMPL: 1 left-over mappings in IOVAS 0x200
Apr 19 13:16:41 Kyrios kernel: NVRM: GPU 0000:02:00.0: RmInitAdapter failed! (0x62:0x40:1941)
Apr 19 13:16:41 Kyrios kernel: NVRM: GPU 0000:02:00.0: rm_init_adapter failed, device minor number 0
Also, what I also see, that it reports that the that there is a mismatch across several nvidia libaries with the version:
Apr 19 12:34:42 Kyrios kernel: NVRM: API mismatch: the client 'nvidia-powerd' (pid 849)
NVRM: has the version 570.133.07, but this kernel module has
NVRM: the version 575.51.02. Please make sure that this
NVRM: kernel module and all NVIDIA driver components
NVRM: have the same version.
I have verified with the user, that all packages are correctly installed and also checked if the checksums are fine - which seem to be. Im not sure, why the nvidia-powerd is reporting the 570.133.07 driver - maybe due the above GSP crash?
To Reproduce
- Install archlinux
- Install the nvidia-beta driver from here https://archive.cachyos.org/nvidia/575/
- Boot into system and check "nvidia-smi"
- Verify the logs
System Specs: https://www.lenovo.com/us/en/p/laptops/legion-laptops/legion-pro-series/legion-pro-7i-gen-10-16-inch-intel/len101g0039 Core Ultra 9 275HX RTX 5080 mobile
Bug Incidence
Always
nvidia-bug-report.log.gz
nvidia-bug-report-gsp-crash-575.log.gz
More Info
No response