linux icon indicating copy to clipboard operation
linux copied to clipboard

[BUG] Null pointer dereference during setting sysclk/dai for rt5682 in CML_HEL_RT5682

Open ghost opened this issue 4 years ago • 4 comments

Describe the bug Found null pointer dereference bug in CML_HEL_RT5682 while running check-kmod-load-unload.sh in internal test 6090.

Command to run TPLG=/lib/firmware/intel/sof-tplg/sof-cml-rt1011-rt5682.tplg ~/sof-test/test-case/check-kmod-load-unload.sh -l 25

Reproduction Rate Only once in recent daily tests. Couldn't reproduce it with 100 iterations of kmod load-unload. Tried to run 50 iterations of kmod load-unload after 15 iterations of check-runtime-pm-status, but still no reproduction.

Environment Kernel Branch: topic/sof-dev Kernel Commit: eb558167 SOF Branch: main SOF Commit: https://github.com/thesofproject/sof/commit/73803adec979 Topology: sof-cml-rt1011-rt5682.tplg Platform: CML_HEL_RT5682

Screenshots or console output dmesg

[ 7171.667940] kernel: sof-audio-pci-intel-cnl 0000:00:1f.3: tplg: complete pipeline PIPELINE.1.SSP0.OUT id 5
[ 7171.668323] kernel: sof-audio-pci-intel-cnl 0000:00:1f.3: ipc tx: 0x30130000: GLB_TPLG_MSG: PIPE_COMPLETE
[ 7171.668589] kernel: sof-audio-pci-intel-cnl 0000:00:1f.3: ipc tx succeeded: 0x30130000: GLB_TPLG_MSG: PIPE_COMPLETE
[ 7171.670013] kernel: rt5682 i2c-10EC5682:00: sysclk/dai not set correctly
[ 7171.670587] kernel: BUG: kernel NULL pointer dereference, address: 000000000000000a
[ 7171.670610] kernel: #PF: supervisor write access in kernel mode
[ 7171.670626] kernel: #PF: error_code(0x0002) - not-present page
[ 7171.670643] kernel: PGD 0 P4D 0 
[ 7171.670671] kernel: Oops: 0002 [#1] SMP NOPTI
[ 7171.670695] kernel: CPU: 1 PID: 83006 Comm: systemd-udevd Not tainted 5.14.0-rc6-daily-default-20210825 #1
[ 7171.670722] kernel: Hardware name: Google Helios/Helios, BIOS  01/21/2020
[ 7171.670735] kernel: RIP: 0010:__clk_register+0x490/0x7e0
[ 7171.670772] kernel: Code: 89 c4 49 89 47 30 49 8d 87 b8 00 00 00 4d 85 e4 0f 84 7a 02 00 00 49 8b 94 24 b0 00 00 00 49 89 97 b8 00 00 00 48 85 d2 74 04 <48> 89 42 08 49 89 84 24 b0 00 00 00 49 8d 84 24 b0 00 00 00 49 89
[ 7171.670796] kernel: RSP: 0018:ffffa1d08158f958 EFLAGS: 00010202
[ 7171.670815] kernel: RAX: ffff9de2d68a94b8 RBX: ffff9de2f0f30bc8 RCX: 0000000071426b9b
[ 7171.670831] kernel: RDX: 0000000000000002 RSI: 0000000000000000 RDI: ffff9de2d68a9400
[ 7171.670836] kernel: RBP: 0000000000000001 R08: 00000000ffffffff R09: 0000000000000000
[ 7171.670836] kernel: R10: 0000000000000000 R11: ffff9de3e705786f R12: ffff9de2e8aec1a8
[ 7171.670836] kernel: R13: ffff9de2e7057a20 R14: ffff9de2ec0f29e8 R15: ffff9de2d68a9400
[ 7171.670836] kernel: FS:  00007f28a2c37880(0000) GS:ffff9de416000000(0000) knlGS:0000000000000000
[ 7171.670836] kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 7171.670836] kernel: CR2: 000000000000000a CR3: 0000000116b10004 CR4: 00000000003706e0
[ 7171.670836] kernel: Call Trace:
[ 7171.670836] kernel:  ? clk_hw_unregister+0x10/0x10
[ 7171.670836] kernel:  clk_hw_register+0x19/0x40
[ 7171.670836] kernel:  devm_clk_hw_register+0x41/0x80
[ 7171.670836] kernel:  rt5682_probe+0x148/0x1f0 [snd_soc_rt5682]
[ 7171.670836] kernel:  snd_soc_component_probe+0x19/0x40 [snd_soc_core]
[ 7171.670836] kernel:  soc_probe_component+0x1cb/0x300 [snd_soc_core]
[ 7171.670836] kernel:  snd_soc_bind_card+0x506/0xcf0 [snd_soc_core]
[ 7171.670836] kernel:  ? is_module_address+0xc/0x20
[ 7171.670836] kernel:  ? lockdep_init_map_type+0x51/0x210
[ 7171.670836] kernel:  ? __raw_spin_lock_init+0x36/0x60
[ 7171.670836] kernel:  devm_snd_soc_register_card+0x3e/0x80 [snd_soc_core]
[ 7171.670836] kernel:  platform_probe+0x4f/0xb0
[ 7171.670836] kernel:  really_probe+0x1a6/0x3a0
[ 7171.670836] kernel:  __driver_probe_device+0xf9/0x170
[ 7171.670836] kernel:  driver_probe_device+0x19/0x90
[ 7171.670836] kernel:  __driver_attach+0x99/0x170
[ 7171.670836] kernel:  ? __device_attach_driver+0xd0/0xd0
[ 7171.670836] kernel:  ? __device_attach_driver+0xd0/0xd0
[ 7171.670836] kernel:  bus_for_each_dev+0x73/0xc0
[ 7171.670836] kernel:  bus_add_driver+0x14b/0x1f0
[ 7171.670836] kernel:  driver_register+0x67/0xb0
[ 7171.670836] kernel:  ? 0xffffffffc02e6000
[ 7171.670836] kernel:  do_one_initcall+0x64/0x220
[ 7171.670836] kernel:  ? __cond_resched+0x10/0x20
[ 7171.670836] kernel:  ? kmem_cache_alloc_trace+0x4a/0x1f0
[ 7171.670836] kernel:  do_init_module+0x56/0x200
[ 7171.670836] kernel:  load_module+0x258e/0x28f0
[ 7171.670836] kernel:  ? __do_sys_finit_module+0xae/0x110
[ 7171.670836] kernel:  __do_sys_finit_module+0xae/0x110
[ 7171.670836] kernel:  do_syscall_64+0x38/0x90
[ 7171.670836] kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 7171.670836] kernel: RIP: 0033:0x7f28a31b789d
[ 7171.670836] kernel: Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d c3 f5 0c 00 f7 d8 64 89 01 48
[ 7171.670836] kernel: RSP: 002b:00007ffcffcc40d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[ 7171.670836] kernel: RAX: ffffffffffffffda RBX: 000055c6cf14e6a0 RCX: 00007f28a31b789d
[ 7171.670836] kernel: RDX: 0000000000000000 RSI: 000055c6cf136b80 RDI: 0000000000000010
[ 7171.670836] kernel: RBP: 0000000000020000 R08: 0000000000000000 R09: 0000000000000000
[ 7171.670836] kernel: R10: 0000000000000010 R11: 0000000000000246 R12: 000055c6cf136b80
[ 7171.670836] kernel: R13: 0000000000000000 R14: 000055c6cf1117d0 R15: 000055c6cf14e6a0
[ 7171.670836] kernel: Modules linked in: snd_sof_pci_intel_tgl snd_soc_cml_rt1011_rt5682(+) snd_soc_intel_hda_dsp_common snd_soc_hdac_hdmi snd_sof_pci_intel_icl snd_hda_codec_hdmi snd_soc_dmic snd_sof_pci_intel_cnl snd_sof_pci_intel_apl snd_sof_intel_hda_common soundwire_intel soundwire_generic_allocation soundwire_cadence snd_sof_intel_hda snd_soc_hdac_hda snd_hda_ext_core snd_hda_codec snd_hwdep snd_hda_core snd_sof_pci_intel_tng snd_sof_pci snd_sof_acpi_intel_bdw snd_sof_acpi_intel_byt snd_sof_acpi snd_sof_intel_atom snd_sof_xtensa_dsp snd_sof snd_soc_acpi_intel_match snd_soc_acpi snd_intel_dspcfg snd_intel_sdw_acpi snd_soc_rt1011 snd_soc_rt715 snd_soc_rt1308_sdw snd_soc_rt1308 snd_soc_rt711 snd_soc_rt700 snd_soc_max98373_sdw snd_soc_max98373_i2c snd_soc_max98373 snd_soc_max98090 snd_soc_max98357a snd_soc_wm8804_i2c snd_soc_wm8804 snd_soc_pcm512x_i2c snd_soc_pcm512x snd_soc_rt5682_sdw regmap_sdw soundwire_bus snd_soc_rt5682_i2c snd_soc_rt5682 snd_soc_rt5677 snd_soc_rt5677_spi snd_soc_rt5670
[ 7171.670836] kernel:  snd_soc_rt5660 snd_soc_rt5651 snd_soc_rt5645 snd_soc_rt5640 snd_soc_rl6231 snd_soc_rt298 snd_soc_rt286 snd_soc_rl6347a snd_soc_da7219 snd_soc_da7213 snd_soc_core snd_compress snd_pcm cdc_ether usbnet r8152 fuse snd_usbmidi_lib intel_pmc_core_pltdrv intel_pmc_core x86_pkg_temp_thermal regmap_i2c snd_seq_midi ledtrig_audio snd_seq_midi_event i915 snd_rawmidi i2c_algo_bit ttm intel_pch_thermal drm_kms_helper processor_thermal_device_pci_legacy snd_seq processor_thermal_device processor_thermal_rfim processor_thermal_mbox intel_soc_dts_iosf snd_seq_device snd_timer snd elan_i2c soundcore int3403_thermal int340x_thermal_zone int3400_thermal acpi_thermal_rel drm drm_panel_orientation_quirks efivarfs spi_pxa2xx_platform intel_lpss_pci xhci_pci intel_lpss idma64 mfd_core xhci_hcd [last unloaded: snd_pcm]
[ 7171.670836] kernel: CR2: 000000000000000a
[ 7171.670836] kernel: ---[ end trace 291f9a7cc43feda9 ]---
[ 7171.670836] kernel: RIP: 0010:__clk_register+0x490/0x7e0
[ 7171.670836] kernel: Code: 89 c4 49 89 47 30 49 8d 87 b8 00 00 00 4d 85 e4 0f 84 7a 02 00 00 49 8b 94 24 b0 00 00 00 49 89 97 b8 00 00 00 48 85 d2 74 04 <48> 89 42 08 49 89 84 24 b0 00 00 00 49 8d 84 24 b0 00 00 00 49 89
[ 7171.670836] kernel: RSP: 0018:ffffa1d08158f958 EFLAGS: 00010202
[ 7171.670836] kernel: RAX: ffff9de2d68a94b8 RBX: ffff9de2f0f30bc8 RCX: 0000000071426b9b
[ 7171.670836] kernel: RDX: 0000000000000002 RSI: 0000000000000000 RDI: ffff9de2d68a9400
[ 7171.670836] kernel: RBP: 0000000000000001 R08: 00000000ffffffff R09: 0000000000000000
[ 7171.670836] kernel: R10: 0000000000000000 R11: ffff9de3e705786f R12: ffff9de2e8aec1a8
[ 7171.670836] kernel: R13: ffff9de2e7057a20 R14: ffff9de2ec0f29e8 R15: ffff9de2d68a9400
[ 7171.670836] kernel: FS:  00007f28a2c37880(0000) GS:ffff9de416000000(0000) knlGS:0000000000000000
[ 7171.670836] kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 7171.670836] kernel: CR2: 000000000000000a CR3: 0000000116b10004 CR4: 00000000003706e0
[ 7171.670836] kernel: BUG: sleeping function called from invalid context at include/linux/percpu-rwsem.h:49
[ 7171.670836] kernel: in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 83006, name: systemd-udevd
[ 7171.670836] kernel: INFO: lockdep is turned off.
[ 7171.670836] kernel: CPU: 1 PID: 83006 Comm: systemd-udevd Tainted: G      D           5.14.0-rc6-daily-default-20210825 #1
[ 7171.670836] kernel: Hardware name: Google Helios/Helios, BIOS  01/21/2020
[ 7171.670836] kernel: Call Trace:
[ 7171.670836] kernel:  dump_stack_lvl+0x34/0x44
[ 7171.670836] kernel:  ___might_sleep.cold+0x95/0xa2
[ 7171.670836] kernel:  exit_signals+0x2b/0x220
[ 7171.670836] kernel:  do_exit+0xc2/0xb40
[ 7171.670836] kernel:  rewind_stack_do_exit+0x17/0x20
[ 7171.670836] kernel: RIP: 0033:0x7f28a31b789d
[ 7171.670836] kernel: Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d c3 f5 0c 00 f7 d8 64 89 01 48
[ 7171.670836] kernel: RSP: 002b:00007ffcffcc40d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[ 7171.670836] kernel: RAX: ffffffffffffffda RBX: 000055c6cf14e6a0 RCX: 00007f28a31b789d
[ 7171.670836] kernel: RDX: 0000000000000000 RSI: 000055c6cf136b80 RDI: 0000000000000010
[ 7171.670836] kernel: RBP: 0000000000020000 R08: 0000000000000000 R09: 0000000000000000
[ 7171.670836] kernel: R10: 0000000000000010 R11: 0000000000000246 R12: 000055c6cf136b80
[ 7171.670836] kernel: R13: 0000000000000000 R14: 000055c6cf1117d0 R15: 000055c6cf14e6a0
[ 7171.840921] kernel: usbcore: registered new interface driver snd-usb-audio

full dmesg: dmesg.txt

ghost avatar Aug 26 '21 08:08 ghost

@YongAnLu00 looks like a duplicate of https://github.com/thesofproject/linux/issues/3008?

plbossart avatar Aug 26 '21 13:08 plbossart

Hi @plbossart, I think it isn't a duplicate because the call trace shows the null pointer dereference occurred from different functions. Although maybe they potentially come from a same problem, it would be better if we treat them as different issues before more investigation.

ghost avatar Aug 27 '21 01:08 ghost

Well, one of the traces points to a register issue while the other points to an unregister. Both are symptoms of corrupted pointers, something bad will happen eventually, the occurrence of the problem will depend on what is in the bad memory.

plbossart avatar Aug 27 '21 12:08 plbossart

@keqiaozhang can you check if this still happens? it's been about a year now since the last update.

plbossart avatar Aug 26 '22 09:08 plbossart

Not seen in ages, closing.

marc-hb avatar Apr 22 '24 23:04 marc-hb