open-gpu-kernel-modules
open-gpu-kernel-modules copied to clipboard
Periodic stutters and “NVRM: RmCheckForGcxSupportOnCurrentState” kernel warnings on Ubuntu 22.04 RTX 4070
NVIDIA Open GPU Kernel Modules Version
550.54.15
Please confirm this issue does not happen with the proprietary driver (of the same version). This issue tracker is only for bugs specific to the open kernel driver.
- [X] I confirm that this does not happen with the proprietary driver package.
Operating System and Version
Ubuntu 22.04.4 LTS
Kernel Release
6.5.0-28-generic
Please confirm you are running a stable release kernel (e.g. not a -rc). We do not accept bug reports for unreleased kernels.
- [X] I am running on a stable kernel release.
Hardware: GPU
GPU 0: NVIDIA GeForce RTX 4070 Laptop GPU (UUID: GPU-02187fd8-22a1-3f71-cd52-22af54f42481)
Describe the bug
I’ve been running into occasional visible “stutters” on my Ubuntu Linux 22.04 system. By stutter, I mean that for ~500ms there is no visible change to the screen. If there is a video playing, it freezes. If I am moving the mouse, the cursor will freeze.
At the same time, I get a ton of kernel messages such as:
Apr 24 14:59:16 banana kernel: NVRM: RmCheckForGcxSupportOnCurrentState: NVRM, Failed to get GCx pre-requisite, status=0xffff
Apr 24 14:59:21 banana kernel: NVRM: _threadNodeCheckTimeout: _threadNodeCheckTimeout: currentTime: 3d63e51c037800 >= 3d63e51c037800
Apr 24 14:59:21 banana kernel: NVRM: _threadNodeCheckTimeout: _threadNodeCheckTimeout: Timeout was set to: 4000 msecs!
Apr 24 14:59:21 banana kernel: NVRM: RmCheckForGcxSupportOnCurrentState: NVRM, Failed to get GCx pre-requisite, status=0xffff
Apr 24 14:59:27 banana kernel: NVRM: _threadNodeCheckTimeout: _threadNodeCheckTimeout: currentTime: 3d63e663d6cf00 >= 3d63e663d6cf00
Apr 24 14:59:27 banana kernel: NVRM: _threadNodeCheckTimeout: _threadNodeCheckTimeout: Timeout was set to: 4000 msecs!
Apr 24 14:59:27 banana kernel: NVRM: RmCheckForGcxSupportOnCurrentState: NVRM, Failed to get GCx pre-requisite, status=0xffff
Apr 24 14:59:32 banana kernel: NVRM: _threadNodeCheckTimeout: _threadNodeCheckTimeout: currentTime: 3d63e7abaa2600 >= 3d63e7abaa2600
Apr 24 14:59:32 banana kernel: NVRM: _threadNodeCheckTimeout: _threadNodeCheckTimeout: Timeout was set to: 4000 msecs!
Apr 24 14:59:32 banana kernel: NVRM: RmCheckForGcxSupportOnCurrentState: NVRM, Failed to get GCx pre-requisite, status=0xffff
Apr 24 14:59:38 banana kernel: NVRM: _threadNodeCheckTimeout: _threadNodeCheckTimeout: currentTime: 3d63e8f37d7d00 >= 3d63e8f37d7d00
Another concerning log entry is:
Apr 24 13:23:29 banana kernel: NVRM: _kgspLogXid119: ********************************* GSP Timeout **********************************
Apr 24 13:23:29 banana kernel: NVRM: _kgspLogXid119: Note: Please also check logs above.
Apr 24 13:23:29 banana kernel: NVRM: nvAssertFailedNoLog: Assertion failed: expectedFunc == pHistoryEntry->function @ kernel_gsp.c:1744
Apr 24 13:23:29 banana kernel: NVRM: GPU at PCI:0000:01:00: GPU-02187fd8-22a1-3f71-cd52-22af54f42481
Apr 24 13:23:29 banana kernel: NVRM: Xid (PCI:0000:01:00): 119, pid=1671935, name=kworker/1:3, Timeout after 1149s of waiting for RPC response from GPU0 GSP! Expected function 4097 (GSP_INIT_DONE) (0x0 0x0).
Apr 24 13:23:29 banana kernel: NVRM: GPU0 GSP RPC buffer contains function 4108 (UCODE_LIBOS_PRINT) and data 0x0000000000000000 0x0000000000000000.
Apr 24 13:23:29 banana kernel: NVRM: GPU0 RPC history (CPU -> GSP):
Apr 24 13:23:29 banana kernel: NVRM: entry function data0 data1 ts_start ts_end duration actively_polling
Apr 24 13:23:29 banana kernel: NVRM: 0 47 UNLOADING_GUEST_DRIVE 0x0000000000000000 0x0000000000000000 0x000616daa95805e8 0x000616daa95d9f17 366 s y
Apr 24 13:23:29 banana kernel: NVRM: -1 10 FREE 0x00000000caf010bb 0x0000000000000000 0x000616daa958044a 0x000616daa95805e5 411us
Apr 24 13:23:29 banana kernel: NVRM: -2 76 GSP_RM_CONTROL 0x0000000020800ac3 0x0000000000000028 0x000616daa9580260 0x000616daa9580447 487us
Apr 24 13:23:29 banana kernel: NVRM: -3 4 ALLOC_MEMORY 0x0000000000000000 0x0000000000000000 0x000616daa957ff81 0x000616daa958025d 732us
Apr 24 13:23:29 banana kernel: NVRM: -4 10 FREE 0x00000000caf010ba 0x0000000000000000 0x000616daa957fd61 0x000616daa957ff79 536us
Apr 24 13:23:29 banana kernel: NVRM: -5 76 GSP_RM_CONTROL 0x0000000020800ac3 0x0000000000000028 0x000616daa957fb78 0x000616daa957fd5f 487us
Apr 24 13:23:29 banana kernel: NVRM: -6 4 ALLOC_MEMORY 0x0000000000000000 0x0000000000000000 0x000616daa957f982 0x000616daa957fb75 499us
Apr 24 13:23:29 banana kernel: NVRM: -7 10 FREE 0x00000000caf010b9 0x0000000000000000 0x000616daa957f7d9 0x000616daa957f97b 418us
Apr 24 13:23:29 banana kernel: NVRM: GPU0 RPC event history (CPU <- GSP):
Apr 24 13:23:29 banana kernel: NVRM: entry function data0 data1 ts_start ts_end duration during_incomplete_rpc
Apr 24 13:23:29 banana kernel: NVRM: 0 4108 UCODE_LIBOS_PRINT 0x0000000000000000 0x0000000000000000 0x000616daed814621 0x000616daed814622 1us
Apr 24 13:23:29 banana kernel: NVRM: -1 4108 UCODE_LIBOS_PRINT 0x0000000000000000 0x0000000000000000 0x000616daed8144ef 0x000616daed8144f0 1us
Apr 24 13:23:29 banana kernel: NVRM: -2 4128 GSP_POST_NOCAT_RECORD 0x0000000000000002 0x0000000000000027 0x000616daed812e49 0x000616daed812e4b 2us
Apr 24 13:23:29 banana kernel: NVRM: -3 4098 GSP_RUN_CPU_SEQUENCER 0x0000000000000628 0x0000000000003fe2 0x000616daed808c11 0x000616daed809d6e 4445us
Apr 24 13:23:29 banana kernel: NVRM: -4 4108 UCODE_LIBOS_PRINT 0x0000000000000000 0x0000000000000000 0x000616daa958d7c0 0x000616daa958d7c1 1us
Apr 24 13:23:29 banana kernel: NVRM: -5 4128 GSP_POST_NOCAT_RECORD 0x0000000000000002 0x0000000000000028 0x000616daa9585c1e 0x000616daa9585c20 2us
Apr 24 13:23:29 banana kernel: NVRM: -6 4111 PERF_BRIDGELESS_INFO_ 0x0000000000000000 0x0000000000000000 0x000616daa9585a33 0x000616daa9585a33
Apr 24 13:23:29 banana kernel: NVRM: -7 4128 GSP_POST_NOCAT_RECORD 0x0000000000000002 0x0000000000000001 0x000616daa853253d 0x000616daa8532544 7us
Apr 24 13:23:29 banana kernel: NVRM: _kgspLogXid119: ********************************************************************************
Apr 24 13:23:29 banana kernel: NVRM: nvCheckOkFailedNoLog: Check failed: Call timed out [NV_ERR_TIMEOUT] (0x00000065) returned from rpcRecvPoll(pGpu, pRpc, NV_VGPU_MSG_EVENT_GSP_INIT_DONE) @ kernel_gsp.c:4074
Apr 24 13:23:29 banana kernel: NVRM: gpuPowerManagementResume: State load at resume for riscv/gsp failed: 0x65
Apr 24 13:23:35 banana kernel: NVRM: Xid (PCI:0000:01:00): 119, pid=1671935, name=kworker/1:3, Timeout after 6s of waiting for RPC response from GPU0 GSP! Expected function 76 (GSP_RM_CONTROL) (0x2080205b 0x4).
Apr 24 13:23:35 banana kernel: NVRM: _issueRpcAndWait: rpcRecvPoll timedout for fn 76!
Apr 24 13:23:35 banana kernel: NVRM: subdeviceCtrlCmdPerfSetPowerstate_KERNEL: NV2080_CTRL_CMD_PERF_SET_POWERSTATE RPC failed
Apr 24 13:23:46 banana kernel: NVRM: Xid (PCI:0000:01:00): 119, pid=854, name=nv_queue, Timeout after 6s of waiting for RPC response from GPU0 GSP! Expected function 76 (GSP_RM_CONTROL) (0x2080a7d7 0x2).
Apr 24 13:23:46 banana kernel: NVRM: _issueRpcAndWait: rpcRecvPoll timedout for fn 76!
Apr 24 13:23:46 banana kernel: NVRM: RmCheckForGcxSupportOnCurrentState: NVRM, Failed to get GCx pre-requisite, status=0x65
Apr 24 13:23:57 banana kernel: NVRM: Rate limiting GSP RPC error prints for GPU at PCI:0000:01:00 (printing 1 of every 30). The GPU likely needs to be reset.
To Reproduce
Unknown. I just use machine for a while and it happens periodically.
Bug Incidence
Sometimes
nvidia-bug-report.log.gz
I have emailed this to [email protected] on 4/24/2024
More Info
No response