open-gpu-kernel-modules icon indicating copy to clipboard operation
open-gpu-kernel-modules copied to clipboard

NULL pointer deference in GrabOwnership+0x4/0x40

Open YusufKhan-gamedev opened this issue 2 years ago • 7 comments

NVIDIA Open GPU Kernel Modules Version

ce3d74ff6b49f7ec0e5e0aa44417f668b0f7189b

Does this happen with the proprietary driver (of the same version) as well?

I cannot test this

Operating System and Version

Description: Fedora release 36 (Thirty Six)

Kernel Release

Linux fedora 5.17.9-300.fc36.x86_64 #1 SMP PREEMPT Wed May 18 15:08:23 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

Hardware: GPU

Its a RTX 2060 from GIGABYTE, I am not going to install the proprietary tool that is suggested

Describe the bug

5.048788] ACPI: [Firmware Bug]: Invalid BIOS _PSS frequency found for processor 7: 0x80000000 MHz

[ 5.048788] ACPI: [Firmware Bug]: Invalid BIOS _PSS frequency found for processor 7: 0x80000000 MHz [ 5.048789] ACPI: [Firmware Bug]: Invalid BIOS _PSS frequency found for processor 7: 0x80000000 MHz [ 5.048789] ACPI: [Firmware Bug]: Invalid BIOS _PSS frequency found for processor 7: 0x80000000 MHz [ 5.048790] ACPI: [Firmware Bug]: Invalid BIOS _PSS frequency found for processor 7: 0x80000000 MHz [ 5.048790] ACPI: [Firmware Bug]: Invalid BIOS _PSS frequency found for processor 7: 0x80000000 MHz [ 5.048791] ACPI: [Firmware Bug]: No valid BIOS _PSS frequency found for processor 7 [ 5.048791] ACPI: [Firmware Bug]: BIOS needs update for CPU frequency support [ 5.696161] nvidia-gpu 0000:01:00.3: i2c timeout error e0000000 [ 5.696165] ucsi_ccg 0-0008: i2c_transfer failed -110 [ 5.696166] ucsi_ccg 0-0008: ucsi_ccg_init failed - -110 [ 5.696168] ucsi_ccg: probe of 0-0008 failed with error -110 [ 5.711771] kauditd_printk_skb: 136 callbacks suppressed [ 5.711772] audit: type=1130 audit(1653611711.576:145): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-udev-settle comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' [ 5.751815] audit: type=1130 audit(1653611711.616:146): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-fsck@dev-disk-by\x2duuid-cd5cf0c9\x2db7ce\x2d41da\x2dbcf1\x2dae0ccb7c629a comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' [ 5.763793] audit: type=1130 audit(1653611711.628:147): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-fsck@dev-disk-by\x2duuid-5B81\x2d8B7D comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' [ 5.767791] EXT4-fs (sda2): mounted filesystem with ordered data mode. Quota mode: none. [ 5.797817] audit: type=1130 audit(1653611711.662:148): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=dracut-shutdown comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' [ 5.819709] audit: type=1130 audit(1653611711.684:149): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=plymouth-read-write comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' [ 5.826745] audit: type=1130 audit(1653611711.691:150): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=import-state comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' [ 5.875769] audit: type=1130 audit(1653611711.740:151): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-tmpfiles-setup comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' [ 5.878325] audit: type=1334 audit(1653611711.742:152): prog-id=60 op=LOAD [ 5.878401] audit: type=1334 audit(1653611711.742:153): prog-id=61 op=LOAD [ 5.878447] audit: type=1334 audit(1653611711.743:154): prog-id=62 op=LOAD [ 5.911293] RPC: Registered named UNIX socket transport module. [ 5.911296] RPC: Registered udp transport module. [ 5.911296] RPC: Registered tcp transport module. [ 5.911296] RPC: Registered tcp NFSv4.1 backchannel transport module. [ 6.038002] Bluetooth: BNEP (Ethernet Emulation) ver 1.3 [ 6.038004] Bluetooth: BNEP filters: protocol multicast [ 6.038007] Bluetooth: BNEP socket layer initialized [ 6.223234] NET: Registered PF_QIPCRTR protocol family [ 6.837424] iwlwifi 0000:00:14.3: Conflict between TLV & NVM regarding enabling LAR (TLV = enabled NVM =disabled) [ 7.024052] iwlwifi 0000:00:14.3: Conflict between TLV & NVM regarding enabling LAR (TLV = enabled NVM =disabled) [ 9.509038] e1000e 0000:00:1f.6 eno2: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx [ 9.509088] IPv6: ADDRCONF(NETDEV_CHANGE): eno2: link becomes ready [ 10.035938] thermal cooling_device11: Setting cooling device state is deprecated [ 11.744620] rfkill: input handler disabled [ 12.420396] Bluetooth: RFCOMM TTY layer initialized [ 12.420400] Bluetooth: RFCOMM socket layer initialized [ 12.420422] Bluetooth: RFCOMM ver 1.11 [ 18.664787] rfkill: input handler enabled [ 50.945360] logitech-hidpp-device 0003:046D:1025.0007: HID++ 1.0 device connected. [ 463.244682] nvidia-modeset: Unloading [ 463.262190] NVOC: __nvoc_objDelete: Child class OBJIOVASPACE not freed from parent class OBJVMM.Allocator 00000000ba323f72 released with memory allocations [ 463.262212] [NvPort] ************************************************* [ 463.262213] NvPort memory tracking information for allocator 00000000ba323f72: [ 463.262213] ACTIVE: 1 allocations, 644 bytes allocated (616 useful, 28 meta) [ 463.262214] TOTAL: 150 allocations, 512133 bytes allocated (507933 useful, 4200 meta) [ 463.262215] PEAK: 148 allocations, 511980 bytes allocated (507836 useful, 4144 meta) [ 463.262216] [NvPort] ************************************************* [ 463.262230] nvidia-nvlink: Unregistered Nvlink Core, major device number 234 [ 463.281105] nvidia: unknown parameter 'modeset' ignored [ 463.281759] nvidia-nvlink: Nvlink Core is being initialized, major device number 234

[ 463.282385] nvidia 0000:01:00.0: vgaarb: changed VGA decodes: olddecodes=none,decodes=none:owns=io+mem [ 463.329634] NVRM: loading NVIDIA UNIX Open Kernel Module for x86_64 515.43.04 Release Build (yusufkhan@) Tue May 24 06:08:38 PM EDT 2022 [ 463.334441] nvidia-modeset: Loading NVIDIA UNIX Open Kernel Mode Setting Driver for x86_64 515.43.04 Release Build (yusufkhan@) Tue May 24 06:08:29 PM EDT 2022 [ 463.337283] [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver [ 463.505116] NVRM kgspInitRm_IMPL: missing NVDEC0 engine, cannot initialize GSP-RM [ 463.505120] NVRM RmInitAdapter: Cannot initialize GSP firmware RM [ 463.505392] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x63:0x56:1689) [ 463.506360] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0 [ 463.506437] [drm:nv_drm_load [nvidia_drm]] ERROR [nvidia-drm] [GPU ID 0x00000100] Failed to allocate NvKmsKapiDevice [ 463.506568] [drm:nv_drm_probe_devices [nvidia_drm]] ERROR [nvidia-drm] [GPU ID 0x00000100] Failed to register device [ 463.506574] BUG: kernel NULL pointer dereference, address: 0000000000000040 [ 463.506576] #PF: supervisor read access in kernel mode [ 463.506578] #PF: error_code(0x0000) - not-present page [ 463.506579] PGD 0 P4D 0 [ 463.506581] Oops: 0000 [#1] PREEMPT SMP NOPTI [ 463.506582] CPU: 5 PID: 838 Comm: systemd-logind Tainted: G OE 5.17.9-300.fc36.x86_64 #1 [ 463.506584] Hardware name: Micro-Star International Co., Ltd. MS-7B17/MPG Z390 GAMING EDGE AC (MS-7B17), BIOS A.A0 08/14/2020 [ 463.506585] RIP: 0010:GrabOwnership+0x4/0x40 [nvidia_modeset] [ 463.506613] Code: 48 89 de 31 d2 bf 06 00 00 00 e8 a7 48 04 00 b8 01 00 00 00 5b c3 31 c0 c3 00 00 00 00 00 00 00 00 00 00 00 00 00 48 83 ec 18 <8b> 57 40 b8 01 00 00 00 48 c7 44 24 08 00 00 00 00 85 d2 74 1c 48 [ 463.506615] RSP: 0018:ffffadab0114bbc8 EFLAGS: 00010292 [ 463.506616] RAX: ffffffffc19d3c30 RBX: ffff9ce2e5041000 RCX: 0000000000000000 [ 463.506617] RDX: 0000000000000001 RSI: ffff9ce24c08b400 RDI: 0000000000000000 [ 463.506618] RBP: ffff9ce2e5041000 R08: 00000000000000c0 R09: ffff9ce2f715db40 [ 463.506619] R10: 0000000000000001 R11: 0000000000000005 R12: ffff9ce2f715db40 [ 463.506619] R13: 0000000000000000 R14: ffff9ce24c08b410 R15: 00000000ed1ec828 [ 463.506620] FS: 00007fb052144bc0(0000) GS:ffff9ce98dd40000(0000) knlGS:0000000000000000 [ 463.506622] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 463.506623] CR2: 0000000000000040 CR3: 000000010bb08001 CR4: 00000000003706e0 [ 463.506624] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 463.506624] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 463.506625] Call Trace: [ 463.506627] <TASK> [ 463.506628] ? preempt_count_add+0x64/0x90 [ 463.506632] ? nv_drm_master_set+0x1e/0x40 [nvidia_drm] [ 463.506635] ? drm_new_set_master+0x90/0x110 [ 463.506638] ? drm_master_open+0x7c/0xa0 [ 463.506639] ? drm_open+0xf8/0x250 [ 463.506642] ? drm_stub_open+0xa2/0xe0 [ 463.506643] ? chrdev_open+0xb1/0x210 [ 463.506645] ? cdev_device_add+0x80/0x80 [ 463.506646] ? do_dentry_open+0x1c4/0x350 [ 463.506648] ? path_openat+0xacd/0x1210 [ 463.506651] ? path_lookupat+0x97/0x190 [ 463.506653] ? do_filp_open+0xa1/0x130 [ 463.506654] ? __check_object_size+0x126/0x140 [ 463.506657] ? _raw_spin_unlock+0x16/0x30 [ 463.506660] ? alloc_fd+0xd1/0x170 [ 463.506661] ? do_sys_openat2+0x76/0x130 [ 463.506663] ? __x64_sys_openat+0x5c/0x70 [ 463.506664] ? do_syscall_64+0x37/0x80 [ 463.506666] ? entry_SYSCALL_64_after_hwframe+0x44/0xae [ 463.506669] </TASK> [ 463.506670] Modules linked in: nvidia_drm(OE) nvidia_modeset(OE) nvidia(OE) rfcomm snd_seq_dummy snd_hrtimer nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink qrtr bnep sunrpc vfat fat snd_sof_pci_intel_cnl snd_sof_intel_hda_common soundwire_intel soundwire_generic_allocation soundwire_cadence snd_sof_intel_hda snd_sof_pci intel_rapl_msr snd_sof_xtensa_dsp intel_rapl_common snd_sof soundwire_bus snd_soc_skl intel_tcc_cooling snd_soc_hdac_hda x86_pkg_temp_thermal intel_powerclamp snd_hda_ext_core coretemp mei_hdcp mei_pxp iTCO_wdt iwlmvm snd_soc_sst_ipc snd_soc_sst_dsp ucsi_ccg intel_pmc_bxt iTCO_vendor_support typec_ucsi ee1004 snd_soc_acpi_intel_match typec mac80211 snd_soc_acpi kvm_intel snd_soc_core libarc4 snd_compress kvm snd_hda_codec_realtek ac97_bus snd_hda_codec_generic [ 463.506697] iwlwifi snd_pcm_dmaengine snd_hda_codec_hdmi ledtrig_audio irqbypass rapl snd_hda_intel intel_cstate iwlmei btusb snd_intel_dspcfg btrtl intel_uncore snd_intel_sdw_acpi btbcm cfg80211 snd_hda_codec btintel pcspkr snd_hda_core btmtk mei_me i2c_i801 intel_wmi_thunderbolt wmi_bmof snd_hwdep i2c_smbus mei bluetooth snd_seq snd_seq_device snd_pcm snd_timer ecdh_generic joydev rfkill snd intel_pch_thermal i2c_nvidia_gpu soundcore acpi_tad acpi_pad zram hid_logitech_hidpp hid_logitech_dj nouveau crct10dif_pclmul crc32_pclmul crc32c_intel e1000e ghash_clmulni_intel drm_ttm_helper ttm mxm_wmi wmi video ip6_tables ip_tables ipmi_devintf ipmi_msghandler fuse [last unloaded: nvidia] [ 463.506720] CR2: 0000000000000040 [ 463.506722] ---[ end trace 0000000000000000 ]--- [ 463.506723] RIP: 0010:GrabOwnership+0x4/0x40 [nvidia_modeset] [ 463.506741] Code: 48 89 de 31 d2 bf 06 00 00 00 e8 a7 48 04 00 b8 01 00 00 00 5b c3 31 c0 c3 00 00 00 00 00 00 00 00 00 00 00 00 00 48 83 ec 18 <8b> 57 40 b8 01 00 00 00 48 c7 44 24 08 00 00 00 00 85 d2 74 1c 48 [ 463.506742] RSP: 0018:ffffadab0114bbc8 EFLAGS: 00010292 [ 463.506743] RAX: ffffffffc19d3c30 RBX: ffff9ce2e5041000 RCX: 0000000000000000 [ 463.506744] RDX: 0000000000000001 RSI: ffff9ce24c08b400 RDI: 0000000000000000 [ 463.506745] RBP: ffff9ce2e5041000 R08: 00000000000000c0 R09: ffff9ce2f715db40 [ 463.506746] R10: 0000000000000001 R11: 0000000000000005 R12: ffff9ce2f715db40 [ 463.506746] R13: 0000000000000000 R14: ffff9ce24c08b410 R15: 00000000ed1ec828 [ 463.506747] FS: 00007fb052144bc0(0000) GS:ffff9ce98dd40000(0000) knlGS:0000000000000000 [ 463.506748] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 463.506749] CR2: 0000000000000040 CR3: 000000010bb08001 CR4: 00000000003706e0 [ 463.506750] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 463.506750] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 463.722840] show_signal: 7 callbacks suppressed [ 463.722841] traps: xss-lock[1807] trap int3 ip:7f1767595df1 sp:7ffc84704890 error:0 [ 463.722845] fbcon: Taking over console [ 463.722850] in libglib-2.0.so.0.7200.1[7f1767559000+91000] [ 463.724632] Console: switching to colour frame buffer device 128x48 [ 464.801771] rfkill: input handler disabled [ 471.179449] rfkill: input handler enabled

To Reproduce

Reload nvidia drivers

Bug Incidence

Once

nvidia-bug-report.log.gz

I believe the dmesg would be enough, it includes a core dump but here it is: nvidia-bug-report.log.gz

More Info

No response

YusufKhan-gamedev avatar May 27 '22 00:05 YusufKhan-gamedev