DRM cause core dump & NVRM dmesg ERROR
NVIDIA Open GPU Kernel Modules Version
nvidia-open-dkms 530.41.03-3
Does this happen with the proprietary driver (of the same version) as well?
I cannot test this
Operating System and Version
Arch Linux
Kernel Release
6.2.10-zen
Hardware: GPU
NVIDIA GeForce RTX 3060 Laptop GPU
Describe the bug
Machine: Alienware M15R7
Launching wayland session causes core dump. I've tryed from Plasma and Hyprland with nvidia-drm as backend.
Also there is a strange error message in dmesg saying
NVRM objClInitPcieChipset: *** Chipset Setup Function Error!
and one on journalctl saying
nvidia: module verification failed: signature and/or required key missing - tainting kernel
by loging the boot the module seems to be loaded, but when i start session it causes core-dump.
If i disable DRM backend hyperland wayland session starts with nvidia driver. No chance for kwin.
dmesg:
[ 0.000000] BIOS-e820: [mem 0x0000000060d11000-0x0000000061571fff] ACPI NVS
[ 0.000000] reserve setup_data: [mem 0x0000000060d11000-0x0000000061571fff] ACPI NVS
[ 0.136418] ACPI: PM: Registering ACPI NVS region [mem 0x60d11000-0x61571fff] (8785920 bytes)
[ 0.255264] ACPI: \_SB_.PC00.CNVW.WRST: New power resource
[ 2.757687] NVRM: loading NVIDIA UNIX Open Kernel Module for x86_64 530.41.03 Release Build (archlinux-builder@archalien)
[ 2.783117] nvidia-modeset: Loading NVIDIA UNIX Open Kernel Mode Setting Driver for x86_64 530.41.03 Release Build (archlinux-builder@archalien)
[ 2.870841] NVRM objClInitPcieChipset: *** Chipset Setup Function Error!
[ 6.933352] input: HDA NVidia HDMI/DP,pcm=3 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card0/input13
[ 6.933604] input: HDA NVidia HDMI/DP,pcm=7 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card0/input14
[ 6.968959] input: HDA NVidia HDMI/DP,pcm=8 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card0/input15
[ 6.968996] input: HDA NVidia HDMI/DP,pcm=9 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card0/input16
[ 7.111300] iwlwifi 0000:00:14.3: CNVI_SCU_SEQ_DATA_DW9: 0x10
[ 67.112494] Asynchronous wait on fence NVIDIA:nvidia.prime:0 timed out (hint:submit_notify [i915])
journactl --grep "nvidia":
Apr 11 15:28:37 archalien kernel: Command line: initrd=\intel-ucode.img initrd=\initramfs-linux-zen.img root=PARTUUID=f2b1acd4-dfa1-4a46-9f19-31b6c2489e5d zswap.e>
Apr 11 15:28:37 archalien kernel: Kernel command line: initrd=\intel-ucode.img initrd=\initramfs-linux-zen.img root=PARTUUID=f2b1acd4-dfa1-4a46-9f19-31b6c2489e5d >
Apr 11 15:28:37 archalien kernel: nvidia: loading out-of-tree module taints kernel.
Apr 11 15:28:37 archalien kernel: nvidia: module verification failed: signature and/or required key missing - tainting kernel
Apr 11 15:28:37 archalien kernel: nvidia-nvlink: Nvlink Core is being initialized, major device number 235
Apr 11 15:28:37 archalien kernel: nvidia 0000:01:00.0: enabling device (0006 -> 0007)
Apr 11 15:28:37 archalien kernel: nvidia 0000:01:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none
Apr 11 15:28:37 archalien kernel: NVRM: loading NVIDIA UNIX Open Kernel Module for x86_64 530.41.03 Release Build (archlinux-builder@archalien)
Apr 11 15:28:37 archalien kernel: nvidia-modeset: Loading NVIDIA UNIX Open Kernel Mode Setting Driver for x86_64 530.41.03 Release Build (archlinux-builder@arc>
Apr 11 15:28:37 archalien kernel: nvidia-uvm: Loaded the UVM driver, major device number 511.
Apr 11 15:28:37 archalien kernel: [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver
Apr 11 15:28:37 archalien kernel: [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:01:00.0 on minor 1
Apr 11 15:28:37 archalien systemd[1]: Starting Load/Save Screen Backlight Brightness of backlight:nvidia_wmi_ec_backlight...
Apr 11 15:28:37 archalien systemd[1]: Finished Load/Save Screen Backlight Brightness of backlight:nvidia_wmi_ec_backlight.
Apr 11 15:28:37 archalien kernel: input: HDA NVidia HDMI/DP,pcm=3 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card0/input13
Apr 11 15:28:37 archalien kernel: input: HDA NVidia HDMI/DP,pcm=7 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card0/input14
Apr 11 15:28:37 archalien kernel: input: HDA NVidia HDMI/DP,pcm=8 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card0/input15
Apr 11 15:28:37 archalien kernel: input: HDA NVidia HDMI/DP,pcm=9 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card0/input16
Apr 11 15:28:38 archalien systemd[1]: Starting NVIDIA Persistence Daemon...
Apr 11 15:28:38 archalien systemd[1]: Starting nvidia-powerd service...
Apr 11 15:28:38 archalien /usr/bin/nvidia-powerd[894]: nvidia-powerd version:1.0(build 1)
Apr 11 15:28:38 archalien systemd[1]: Started NVIDIA Persistence Daemon.
Apr 11 15:28:38 archalien systemd[1]: nvidia-powerd.service: Main process exited, code=exited, status=1/FAILURE
Apr 11 15:28:38 archalien systemd[1]: nvidia-powerd.service: Failed with result 'exit-code'.
Apr 11 15:28:38 archalien systemd[1]: Failed to start nvidia-powerd service.
Apr 11 15:28:47 archalien systemd-coredump[2185]: [🡕] Process 2119 (Hyprland) of user 1000 dumped core.
Stack trace of thread 2119:
#0 0x000055ddb5ac99e8 _ZN13CPluginSystem13getAllPluginsEv (Hyprland + 0x1939e8)
#1 0x000055ddb5a31aee _ZN13CrashReporter18createAndSaveCrashEi (Hyprland + 0xfbaee)
journactl --grep "kwin":
Apr 11 12:45:51 archalien systemd[2214]: plasma-kwin_wayland.service: Consumed 1.174s CPU time.
Apr 11 12:46:17 archalien kwin_x11[4178]: kwin_xkbcommon: XKB: inet:323:58: unrecognized keysym "XF86EmojiPicker"
Apr 11 12:46:17 archalien kwin_x11[4178]: kwin_xkbcommon: XKB: inet:324:58: unrecognized keysym "XF86Dictate"
Apr 11 12:46:17 archalien kwin_x11[4178]: kwin_core: Parse error in tiles configuration for monitor "ada15eeb-9ed6-5738-a180-bd9fe2361632" : "illegal value" Cre>
Apr 11 12:46:17 archalien kwin_x11[4178]: kwin_core: Parse error in tiles configuration for monitor "0d3998b5-12fb-5e5d-9844-298a9a2f96a3" : "illegal value" Cre>
Apr 11 12:46:18 archalien kwin_x11[4178]: kwin_platform_x11_standalone: QOpenGLContext::globalShareContext() is required
Apr 11 12:46:18 archalien kwin_x11[4178]: kwin_scene_opengl: Creating the OpenGL rendering failed: "Could not initialize rendering context"
Apr 11 12:51:06 archalien systemd[3951]: plasma-kwin_x11.service: Consumed 3.876s CPU time.
Apr 11 12:51:15 archalien kernel: kwin_wayland[9195]: segfault at 0 ip 00007fd0eb81556b sp 00007ffdb36d6ec0 error 4 in libnvidia-allocator.so.530.41.03[7fd0eb80>
Apr 11 12:51:16 archalien systemd-coredump[9350]: [🡕] Process 9195 (kwin_wayland) of user 1000 dumped core.
Stack trace of thread 9195:
#0 0x00007fd0eb81556b n/a (nvidia-drm_gbm.so + 0x1556b)
#1 0x00007fd0eb815838 n/a (nvidia-drm_gbm.so + 0x15838)
#2 0x00007fd0f805ce59 n/a (libgbm.so.1 + 0x4e59)
#3 0x00007fd0f805eab1 gbm_create_device (libgbm.so.1 + 0x6ab1)
#4 0x00007fd0fb161e74 _ZN4KWin10DrmBackend6addGpuERK7QString (libkwin.so.5 + 0x361e74)
#5 0x00007fd0fb15ef1b _ZN4KWin10DrmBackend10initializeEv (libkwin.so.5 + 0x35ef1b)
#6 0x0000561b851f1315 n/a (kwin_wayland + 0x5a315)
#7 0x0000561b851e723c n/a (kwin_wayland + 0x5023c)
#8 0x00007fd0f863c790 n/a (libc.so.6 + 0x23790)
#9 0x00007fd0f863c84a __libc_start_main (libc.so.6 + 0x2384a)
#10 0x0000561b851e8e95 n/a (kwin_wayland + 0x51e95)
journalctl --grep "hyprland"
❯ journalctl --grep "hyprland"
Apr 08 10:14:22 archalien sddm-helper[2192]: Starting Wayland user session: "/usr/share/sddm/scripts/wayland-session" "Hyprland"
Apr 08 10:14:23 archalien kernel: Hyprland[2208]: segfault at 10 ip 0000557959c00b28 sp 00007fff9c518740 error 4 in Hyprland[557959ad1000+15b000] likely on CPU 8 >
Apr 08 10:14:23 archalien systemd-coredump[2249]: [🡕] Process 2208 (Hyprland) of user 1000 dumped core.
Stack trace of thread 2208:
#0 0x0000557959c00b28 _ZN13CPluginSystem13getAllPluginsEv (Hyprland + 0x18cb28)
#1 0x0000557959b6b13e _ZN13CrashReporter18createAndSaveCrashEi (Hyprland + 0xf713e)
#2 0x0000557959b07f3c _Z25handleUnrecoverableSignali (Hyprland + 0x93f3c)
#3 0x00007f9c6fb69f50 n/a (libc.so.6 + 0x38f50)
#4 0x00007f9c6fbb88ec n/a (libc.so.6 + 0x878ec)
#5 0x00007f9c6fb69ea8 raise (libc.so.6 + 0x38ea8)
#6 0x00007f9c6fb5353d abort (libc.so.6 + 0x2253d)
#7 0x00007f9c6fe9a833 _ZN9__gnu_cxx27__verbose_terminate_handlerEv (libstdc++.so.6 + 0x9a833)
#8 0x00007f9c6fea6d0c _ZN10__cxxabiv111__terminateEPFvvE (libstdc++.so.6 + 0xa6d0c)
#9 0x00007f9c6fea6d79 _ZSt9terminatev (libstdc++.so.6 + 0xa6d79)
#10 0x00007f9c6fea6fdd __cxa_throw (libstdc++.so.6 + 0xa6fdd)
#11 0x0000557959ad5a74 _ZN11CCompositor10initServerEv.cold (Hyprland + 0x61a74)
#12 0x0000557959afaa2b main (Hyprland + 0x86a2b)
#13 0x00007f9c6fb54790 n/a (libc.so.6 + 0x23790)
#14 0x00007f9c6fb5484a __libc_start_main (libc.so.6 + 0x2384a)
#15 0x0000557959b07e05 _start (Hyprland + 0x93e05)
ELF object binary architecture: AMD x86-64
Apr 08 10:14:24 archalien sddm-greeter[2278]: Reading from "/usr/local/share/wayland-sessions/hyprland.desktop"
Apr 08 10:14:24 archalien sddm-greeter[2278]: Reading from "/usr/share/wayland-sessions/hyprland.desktop"
Apr 08 10:14:50 archalien kernel: Hyprland[2970]: segfault at 10 ip 0000562e52fe1b28 sp 00007ffe87d89100 error 4 in Hyprland[562e52eb2000+15b000] likely on CPU 6 >
Apr 08 10:14:50 archalien systemd-coredump[2987]: [🡕] Process 2970 (Hyprland) of user 1000 dumped core.
Stack trace of thread 2970:
#0 0x0000562e52fe1b28 _ZN13CPluginSystem13getAllPluginsEv (Hyprland + 0x18cb28)
#1 0x0000562e52f4c13e _ZN13CrashReporter18createAndSaveCrashEi (Hyprland + 0xf713e)
nvidia-smi -q:
==============NVSMI LOG==============
Timestamp : Tue Apr 11 16:10:42 2023
Driver Version : 530.41.03
CUDA Version : 12.1
Attached GPUs : 1
GPU 00000000:01:00.0
Product Name : NVIDIA GeForce RTX 3060 Laptop GPU
Product Brand : GeForce
Product Architecture : Ampere
Display Mode : Disabled
Display Active : Disabled
Persistence Mode : Enabled
MIG Mode
Current : N/A
Pending : N/A
Accounting Mode : Disabled
Accounting Mode Buffer Size : 4000
Driver Model
Current : N/A
Pending : N/A
Serial Number : N/A
GPU UUID : GPU-3c621dcd-20d4-109f-7874-ee23c382942e
Minor Number : 0
VBIOS Version : 94.06.29.00.35
MultiGPU Board : No
Board ID : 0x100
Board Part Number : N/A
GPU Part Number : 2560-775-A1
FRU Part Number : N/A
Module ID : 1
Inforom Version
Image Version : G001.0000.03.03
OEM Object : 2.0
ECC Object : N/A
Power Management Object : N/A
GPU Operation Mode
Current : N/A
Pending : N/A
GSP Firmware Version : 530.41.03
GPU Virtualization Mode
Virtualization Mode : None
Host VGPU Mode : N/A
GPU Reset Status
Reset Required : No
Drain and Reset Recommended : N/A
IBMNPU
Relaxed Ordering Mode : N/A
PCI
Bus : 0x01
Device : 0x00
Domain : 0x0000
Device Id : 0x256010DE
Bus Id : 00000000:01:00.0
Sub System Id : 0x0B541028
GPU Link Info
PCIe Generation
Max : 4
Current : 1
Device Current : 1
Device Max : 4
Host Max : 4
Link Width
Max : 16x
Current : 8x
Bridge Chip
Type : N/A
Firmware : N/A
Replays Since Reset : 0
Replay Number Rollovers : 0
Tx Throughput : 1000 KB/s
Rx Throughput : 0 KB/s
Atomic Caps Inbound : N/A
Atomic Caps Outbound : N/A
Fan Speed : N/A
Performance State : P8
Clocks Throttle Reasons
Idle : Active
Applications Clocks Setting : Not Active
SW Power Cap : Not Active
HW Slowdown : Not Active
HW Thermal Slowdown : Not Active
HW Power Brake Slowdown : Not Active
Sync Boost : Not Active
SW Thermal Slowdown : Not Active
Display Clock Setting : Not Active
FB Memory Usage
Total : 6144 MiB
Reserved : 366 MiB
Used : 195 MiB
Free : 5582 MiB
BAR1 Memory Usage
Total : 8192 MiB
Used : 8 MiB
Free : 8184 MiB
Compute Mode : Default
Utilization
Gpu : 0 %
Memory : 4 %
Encoder : 0 %
Decoder : 0 %
Encoder Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
FBC Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
ECC Mode
Current : N/A
Pending : N/A
ECC Errors
Volatile
SRAM Correctable : N/A
SRAM Uncorrectable : N/A
DRAM Correctable : N/A
DRAM Uncorrectable : N/A
Aggregate
SRAM Correctable : N/A
SRAM Uncorrectable : N/A
DRAM Correctable : N/A
DRAM Uncorrectable : N/A
Retired Pages
Single Bit ECC : N/A
Double Bit ECC : N/A
Pending Page Blacklist : N/A
Remapped Rows : N/A
Temperature
GPU Current Temp : 52 C
GPU Shutdown Temp : 105 C
GPU Slowdown Temp : 102 C
GPU Max Operating Temp : 87 C
GPU Target Temperature : N/A
Memory Current Temp : N/A
Memory Max Operating Temp : N/A
Power Readings
Power Management : N/A
Power Draw : 17.97 W
Power Limit : N/A
Default Power Limit : N/A
Enforced Power Limit : N/A
Min Power Limit : N/A
Max Power Limit : N/A
Clocks
Graphics : 210 MHz
SM : 210 MHz
Memory : 405 MHz
Video : 555 MHz
Applications Clocks
Graphics : N/A
Memory : N/A
Default Applications Clocks
Graphics : N/A
Memory : N/A
Deferred Clocks
Memory : N/A
Max Clocks
Graphics : 2100 MHz
SM : 2100 MHz
Memory : 7001 MHz
Video : 1950 MHz
Max Customer Boost Clocks
Graphics : N/A
Clock Policy
Auto Boost : N/A
Auto Boost Default : N/A
Voltage
Graphics : 643.750 mV
Fabric
State : N/A
Status : N/A
To Reproduce
Enable drm and try to start wayland session. Errors in dmesg appears every boot.
Bug Incidence
Always
nvidia-bug-report.log.gz
More Info
No response
Use the 525.105.17 version!
Use the 525.105.17 version!
The same issue occurs in that version. Apparently it happens when you log in with a monitor refresh rate higher than 60Hz. After you log in, changing from 60Hz to a higher frequency is fine. The problem only occurs when you try to log in with a frequency higher than 60Hz.
I'm using a GTX 1070. I'd downgrade to version 525.89.02 as that version didn't cause me any issues, but I'm having trouble compiling it with the current kernel version 6.3.1.
@Kaoticz try the new driver 525.116.04, I'm using, so far so good.