Kernel oops on Helios Chromebook
Kernel oops detected by Intel test 4819 on CML_HEL_RT5682
Test Case URL: https://github.com/thesofproject/sof-test/tree/master/test-case/check-kmod-load-unload.sh cmd: TPLG=sof-cml-rt1011-rt5682.tplg ~/sof-test/test-case/check-kmod-load-unload.sh -l 25
[ 5993.111452] kernel: sof-audio-pci-intel-cnl 0000:00:1f.3: ipc tx succeeded: 0x40010000: GLB_PM_MSG: CTX_SAVE
[ 5993.111689] kernel: sof-audio-pci-intel-cnl 0000:00:1f.3: FW Poll Status: reg[0x80]=0x20140000 successful
[ 5993.189809] kernel: sof-audio-pci-intel-cnl 0000:00:1f.3: FW Poll Status: reg[0x4]=0x1010f0f successful
[ 5993.190413] kernel: sof-audio-pci-intel-cnl 0000:00:1f.3: FW Poll Status: reg[0x4]=0xf0f successful
[ 5993.190421] kernel: sof-audio-pci-intel-cnl 0000:00:1f.3: DSP core(s) enabled? 0 : core_mask 1
[ 5994.688732] kernel: BUG: unable to handle page fault for address: ffffffffc0978200
[ 5994.688742] kernel: #PF: supervisor write access in kernel mode
[ 5994.688747] kernel: #PF: error_code(0x0002) - not-present page
[ 5994.688751] kernel: PGD 1efe27067 P4D 1efe27067 PUD 1efe29067 PMD 10ce6f067 PTE 0
[ 5994.688760] kernel: Oops: 0002 [#1] SMP NOPTI
[ 5994.688766] kernel: CPU: 2 PID: 80996 Comm: rmmod Not tainted 5.13.0-rc5-pr2989-5919-default #3c0c1a01
[ 5994.688772] kernel: Hardware name: Google Helios/Helios, BIOS 01/21/2020
[ 5994.688775] kernel: RIP: 0010:clk_unregister+0x189/0x240
[ 5994.688785] kernel: Code: 00 48 85 c0 75 de 49 8b 41 08 49 83 c1 08 48 85 c0 75 c9 48 8b 45 00 48 8b 90 c0 00 00 00 48 85 d2 74 2d 48 8b 88 b8 00 00 00 <48> 89 0a 48 85 c9 74 04 48 89 51 08 48 c7 80 b8 00 00 00 00 00 00
[ 5994.688790] kernel: RSP: 0018:ffffa30680777e08 EFLAGS: 00010286
[ 5994.688795] kernel: RAX: ffffa07637729f00 RBX: ffffa07637729f00 RCX: ffffffff9ef85280
[ 5994.688799] kernel: RDX: ffffffffc0978200 RSI: ffffa07637729f00 RDI: ffffa076024e7400
[ 5994.688802] kernel: RBP: ffffa07615958ba0 R08: ffffa076024e7400 R09: ffffffff9e5fda50
[ 5994.688805] kernel: R10: 0000000000000001 R11: 0000000000000000 R12: ffffa30680777e20
[ 5994.688808] kernel: R13: 000000000000000f R14: ffffa07600ae9fc0 R15: 0000000000000000
[ 5994.688811] kernel: FS: 00007f1516db6540(0000) GS:ffffa07756200000(0000) knlGS:0000000000000000
[ 5994.688816] kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 5994.688820] kernel: CR2: ffffffffc0978200 CR3: 000000013a8d6002 CR4: 00000000003706e0
[ 5994.688824] kernel: Call Trace:
[ 5994.688829] kernel: release_nodes+0x1a9/0x1f0
[ 5994.688840] kernel: i2c_device_remove+0x37/0xb0
[ 5994.688848] kernel: device_release_driver_internal+0xee/0x1d0
[ 5994.688853] kernel: driver_detach+0x42/0x90
[ 5994.688858] kernel: bus_remove_driver+0x56/0xd0
[ 5994.688866] kernel: i2c_del_driver+0x19/0x40
[ 5994.688871] kernel: __x64_sys_delete_module+0x18c/0x250
[ 5994.688881] kernel: do_syscall_64+0x3d/0x80
[ 5994.688889] kernel: entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 5994.688897] kernel: RIP: 0033:0x7f1516f02bcb
[ 5994.688901] kernel: Code: 73 01 c3 48 8b 0d c5 82 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 95 82 0c 00 f7 d8 64 89 01 48
[ 5994.688906] kernel: RSP: 002b:00007ffe077397c8 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
[ 5994.688911] kernel: RAX: ffffffffffffffda RBX: 0000555ac7f2c790 RCX: 00007f1516f02bcb
[ 5994.688915] kernel: RDX: 000000000000000a RSI: 0000000000000800 RDI: 0000555ac7f2c7f8
[ 5994.688918] kernel: RBP: 00007ffe07739828 R08: 0000000000000000 R09: 0000000000000000
[ 5994.688921] kernel: R10: 00007f1516f7eac0 R11: 0000000000000206 R12: 00007ffe07739a00
[ 5994.688924] kernel: R13: 00007ffe0773adf8 R14: 0000555ac7f2c2a0 R15: 0000555ac7f2c790
[ 5994.688929] kernel: Modules linked in: snd_soc_hdac_hdmi snd_hda_codec_hdmi snd_soc_dmic soundwire_intel soundwire_generic_allocation soundwire_cadence snd_soc_hdac_hda snd_hda_ext_core snd_hda_codec snd_hwdep snd_hda_core snd_soc_acpi snd_intel_dspcfg snd_intel_sdw_acpi snd_soc_max98373_sdw snd_soc_max98373_i2c snd_soc_max98373 snd_soc_max98090 snd_soc_max98357a snd_soc_wm8804_i2c snd_soc_wm8804 regmap_sdw soundwire_bus snd_soc_rt5682_i2c(-) snd_soc_rt5682 snd_soc_rl6231 snd_soc_rl6347a snd_soc_core snd_compress snd_pcm fuse ax88179_178a usbnet snd_usbmidi_lib intel_pmc_core_pltdrv intel_pmc_core x86_pkg_temp_thermal ledtrig_audio i915 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops regmap_i2c intel_pch_thermal snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq snd_seq_device snd_timer snd soundcore elan_i2c int3403_thermal int340x_thermal_zone int3400_thermal acpi_thermal_rel drm efivarfs spi_pxa2xx_platform intel_lpss_pci xhci_pci intel_lpss idma64 xhci_hcd
[ 5994.689015] kernel: [last unloaded: snd_soc_rt5682_sdw]
[ 5994.689022] kernel: CR2: ffffffffc0978200
[ 5994.689025] kernel: ---[ end trace 76680917a2530f8b ]---
[ 5994.689029] kernel: RIP: 0010:clk_unregister+0x189/0x240
[ 5994.689035] kernel: Code: 00 48 85 c0 75 de 49 8b 41 08 49 83 c1 08 48 85 c0 75 c9 48 8b 45 00 48 8b 90 c0 00 00 00 48 85 d2 74 2d 48 8b 88 b8 00 00 00 <48> 89 0a 48 85 c9 74 04 48 89 51 08 48 c7 80 b8 00 00 00 00 00 00
[ 5994.689039] kernel: RSP: 0018:ffffa30680777e08 EFLAGS: 00010286
[ 5994.689043] kernel: RAX: ffffa07637729f00 RBX: ffffa07637729f00 RCX: ffffffff9ef85280
[ 5994.689046] kernel: RDX: ffffffffc0978200 RSI: ffffa07637729f00 RDI: ffffa076024e7400
[ 5994.689049] kernel: RBP: ffffa07615958ba0 R08: ffffa076024e7400 R09: ffffffff9e5fda50
[ 5994.689052] kernel: R10: 0000000000000001 R11: 0000000000000000 R12: ffffa30680777e20
[ 5994.689055] kernel: R13: 000000000000000f R14: ffffa07600ae9fc0 R15: 0000000000000000
[ 5994.689058] kernel: FS: 00007f1516db6540(0000) GS:ffffa07756200000(0000) knlGS:0000000000000000
[ 5994.689062] kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 5994.689065] kernel: CR2: ffffffffc0978200 CR3: 000000013a8d6002 CR4: 00000000003706e0
[ 5994.689070] kernel: BUG: sleeping function called from invalid context at include/linux/percpu-rwsem.h:49
[ 5994.689073] kernel: in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 80996, name: rmmod
[ 5994.689078] kernel: INFO: lockdep is turned off.
[ 5994.689080] kernel: CPU: 2 PID: 80996 Comm: rmmod Tainted: G D 5.13.0-rc5-pr2989-5919-default #3c0c1a01
[ 5994.689085] kernel: Hardware name: Google Helios/Helios, BIOS 01/21/2020
[ 5994.689088] kernel: Call Trace:
[ 5994.689091] kernel: dump_stack+0x64/0x7c
[ 5994.689098] kernel: ___might_sleep.cold+0x95/0xa2
[ 5994.689106] kernel: exit_signals+0x2b/0x220
[ 5994.689115] kernel: do_exit+0xc2/0xb50
[ 5994.689123] kernel: rewind_stack_do_exit+0x17/0x20
[ 5994.689129] kernel: RIP: 0033:0x7f1516f02bcb
[ 5994.689133] kernel: Code: 73 01 c3 48 8b 0d c5 82 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 95 82 0c 00 f7 d8 64 89 01 48
[ 5994.689137] kernel: RSP: 002b:00007ffe077397c8 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
[ 5994.689142] kernel: RAX: ffffffffffffffda RBX: 0000555ac7f2c790 RCX: 00007f1516f02bcb
[ 5994.689145] kernel: RDX: 000000000000000a RSI: 0000000000000800 RDI: 0000555ac7f2c7f8
[ 5994.689148] kernel: RBP: 00007ffe07739828 R08: 0000000000000000 R09: 0000000000000000
[ 5994.689151] kernel: R10: 00007f1516f7eac0 R11: 0000000000000206 R12: 00007ffe07739a00
[ 5994.689153] kernel: R13: 00007ffe0773adf8 R14: 0000555ac7f2c2a0 R15: 0000555ac7f2c790
@sathya-nujella @naveen-manohar @cujomalainey @bzhg @bardliao @oder-chiou @shumingfan FYI probably unrelated to SOF, probably codec or machine driver issue.
This error happens while unloading snd_soc_rt5682_i2c
RMMOD snd_sof_pci_intel_tng
RMMOD snd_sof_pci_intel_apl
RMMOD snd_sof_pci_intel_cnl
RMMOD snd_sof_pci_intel_icl
RMMOD snd_sof_pci_intel_tgl
RMMOD snd_sof_acpi_intel_byt
RMMOD snd_sof_acpi_intel_bdw
RMMOD snd_sof_intel_hda_common
RMMOD snd_sof_acpi
RMMOD snd_sof_pci
RMMOD snd_sof_intel_atom
SKIP snd_soc_catpt not loaded
SKIP snd_intel_sst_acpi not loaded
SKIP snd_intel_sst_core not loaded
SKIP snd_soc_sst_atom_hifi2_platform not loaded
SKIP snd_soc_skl not loaded
SKIP snd_sof_intel_byt not loaded
SKIP snd_sof_intel_bdw not loaded
SKIP snd_sof_intel_hda_common not loaded
RMMOD snd_sof_intel_hda
RMMOD snd_sof_intel_ipc
RMMOD snd_sof_xtensa_dsp
RMMOD snd_soc_acpi_intel_match
SKIP snd_soc_sof_rt5682 not loaded
SKIP snd_soc_sof_da7219_max98373 not loaded
SKIP snd_soc_sst_bdw_rt5677_mach not loaded
SKIP snd_soc_sst_broadwell not loaded
SKIP snd_soc_sst_bxt_da7219_max98357a not loaded
SKIP snd_soc_sst_sof_pcm512x not loaded
SKIP snd_soc_sst_bxt_rt298 not loaded
SKIP snd_soc_sst_sof_wm8804 not loaded
SKIP snd_soc_sst_byt_cht_da7213 not loaded
SKIP snd_soc_sst_byt_cht_es8316 not loaded
SKIP snd_soc_sst_bytcr_rt5640 not loaded
SKIP snd_soc_sst_bytcr_rt5651 not loaded
SKIP snd_soc_sst_cht_bsw_max98090_ti not loaded
SKIP snd_soc_sst_cht_bsw_nau8824 not loaded
SKIP snd_soc_sst_cht_bsw_rt5645 not loaded
SKIP snd_soc_sst_cht_bsw_rt5672 not loaded
SKIP snd_soc_sst_glk_rt5682_max98357a not loaded
RMMOD snd_soc_cml_rt1011_rt5682
SKIP snd_soc_skl_hda_dsp not loaded
SKIP snd_soc_sdw_rt700 not loaded
SKIP snd_soc_sdw_rt711_rt1308_rt715 not loaded
SKIP snd_soc_sof_sdw not loaded
SKIP snd_soc_ehl_rt5660 not loaded
RMMOD snd_soc_intel_hda_dsp_common
SKIP snd_soc_intel_sof_maxim_common not loaded
SKIP snd_sof_ipc_test not loaded
SKIP snd_sof_probes not loaded
SKIP snd_sof_intel_client not loaded
SKIP snd_sof_client not loaded
SKIP snd_sof_nocodec not loaded
RMMOD snd_sof
SKIP snd_sof_nocodec not loaded
RMMOD snd_soc_da7213
RMMOD snd_soc_da7219
RMMOD snd_soc_pcm512x_i2c
RMMOD snd_soc_pcm512x
SKIP snd_soc_rt274 not loaded
RMMOD snd_soc_rt286
RMMOD snd_soc_rt298
RMMOD snd_soc_rt700
RMMOD snd_soc_rt711
RMMOD snd_soc_rt1308
RMMOD snd_soc_rt1308_sdw
RMMOD snd_soc_rt715
SKIP snd_soc_rt711_sdca not loaded
SKIP snd_soc_rt1316_sdw not loaded
SKIP snd_soc_rt715_sdca not loaded
SKIP snd_soc_sdw_mockup not loaded
RMMOD snd_soc_rt1011
RMMOD snd_soc_rt5640
RMMOD snd_soc_rt5645
RMMOD snd_soc_rt5651
RMMOD snd_soc_rt5660
RMMOD snd_soc_rt5670
RMMOD snd_soc_rt5677
RMMOD snd_soc_rt5677_spi
RMMOD snd_soc_rt5682_sdw
RMMOD snd_soc_rt5682_i2c
/home/ubuntu/sof-test/tools/kmod/sof_remove.sh: line 5: 80995 Killed sudo rmmod "$MODULE"
snd_soc_hdac_hdmi 45056 0
snd_hda_codec_hdmi 73728 0
snd_soc_dmic 16384 0
soundwire_intel 40960 0
soundwire_generic_allocation 16384 1 soundwire_intel
soundwire_cadence 36864 1 soundwire_intel
snd_soc_hdac_hda 24576 0
snd_hda_ext_core 36864 2 snd_soc_hdac_hdmi,snd_soc_hdac_hda
snd_hda_codec 167936 2 snd_hda_codec_hdmi,snd_soc_hdac_hda
snd_hwdep 16384 1 snd_hda_codec
snd_hda_core 106496 5 snd_hda_codec_hdmi,snd_hda_ext_core,snd_hda_codec,snd_soc_hdac_hdmi,snd_soc_hdac_hda
snd_soc_acpi 16384 0
snd_intel_dspcfg 28672 0
snd_intel_sdw_acpi 20480 1 snd_intel_dspcfg
snd_soc_max98373_sdw 20480 0
snd_soc_max98373_i2c 20480 0
snd_soc_max98373 40960 2 snd_soc_max98373_i2c,snd_soc_max98373_sdw
snd_soc_max98090 114688 0
snd_soc_max98357a 20480 0
snd_soc_wm8804_i2c 16384 0
snd_soc_wm8804 20480 1 snd_soc_wm8804_i2c
regmap_sdw 16384 1 snd_soc_max98373_sdw
soundwire_bus 94208 5 regmap_sdw,soundwire_intel,soundwire_generic_allocation,soundwire_cadence,snd_soc_max98373_sdw
snd_soc_rt5682_i2c 16384 -1
snd_soc_rt5682 94208 1 snd_soc_rt5682_i2c
snd_soc_rl6231 20480 1 snd_soc_rt5682
snd_soc_rl6347a 16384 0
snd_soc_core 323584 12 snd_soc_max98373,snd_soc_wm8804,snd_soc_rt5682,soundwire_intel,snd_soc_hdac_hdmi,snd_soc_hdac_hda,snd_soc_max98373_i2c,snd_soc_max98090,snd_soc_max98357a,snd_soc_max98373_sdw,snd_soc_dmic,snd_soc_rt5682_i2c
snd_compress 32768 1 snd_soc_core
snd_pcm 155648 12 snd_hda_codec_hdmi,snd_soc_wm8804,snd_soc_rt5682,snd_hda_codec,soundwire_intel,snd_soc_hdac_hdmi,snd_compress,snd_soc_max98373_i2c,snd_soc_max98090,snd_soc_core,snd_soc_max98373_sdw,snd_hda_core
snd_usbmidi_lib 40960 0
drm_kms_helper 278528 1 i915
syscopyarea 16384 1 drm_kms_helper
sysfillrect 16384 1 drm_kms_helper
sysimgblt 16384 1 drm_kms_helper
fb_sys_fops 16384 1 drm_kms_helper
regmap_i2c 16384 4 snd_soc_wm8804_i2c,snd_soc_max98373_i2c,snd_soc_max98090,snd_soc_rt5682_i2c
snd_seq_midi 20480 0
snd_seq_midi_event 16384 1 snd_seq_midi
snd_rawmidi 45056 2 snd_seq_midi,snd_usbmidi_lib
snd_seq 86016 2 snd_seq_midi,snd_seq_midi_event
snd_seq_device 16384 3 snd_seq,snd_seq_midi,snd_rawmidi
snd_timer 40960 2 snd_seq,snd_pcm
snd 102400 12 snd_seq,snd_seq_device,snd_hda_codec_hdmi,snd_hwdep,snd_usbmidi_lib,snd_hda_codec,snd_timer,snd_soc_hdac_hdmi,snd_compress,snd_soc_core,snd_pcm,snd_rawmidi
soundcore 16384 1 snd
drm 552960 5 drm_kms_helper,i915
/home/ubuntu/sof-test/tools/kmod/sof_remove.sh FAILED
@plbossart
[ 5994.688772] kernel: Hardware name: Google Helios/Helios, BIOS 01/21/2020
[ 5994.688775] kernel: RIP: 0010:clk_unregister+0x189/0x240
It seems the resource was released before when the snd_soc_rt5682_i2c unloads.
Could you enable devres_log to get more information before the script test?
# cd /sys/module/devres/parameters/
# echo 1 > log
good suggestion @shumingfan. We've only seen this issue once so it's unlikely to reproduce immediately.
@fredoh9 can we add this to sof-dyndbg.conf
options devres log=1
It seems we already have CONFIG_DEBUG_DEVRES=y
Sure, i will add that and collect more logs
Easy to reproduce with latest builds.
"version" : {
"kernelBranch" : "topic/sof-dev",
"kernelCommit" : "695d444a",
"sofBranch" : "main",
"sofCommit" : "86c9a037e212",
"firmwareType" : "xcc"
},
cmdline.txt 0624_hel_dmesg.log
@plbossart do we want this devres log=1 for all SOF CI devices?
wait... i thought I reproduced this, but it's NOT. Hum, devres log print in error level? Then I need to ignore all the errors from devres. Let me play with it more and update again.
devres errors are fatal IMHO so should be show-stoppers, not filtered.
@fredoh9 @plbossart I checked 0624_hel_dmesg.log, but it doesn't have oops messages. Could we change the devres message to dev_info level temporarily?
--- a/drivers/base/devres.c
+++ b/drivers/base/devres.c
@@ -58,7 +58,7 @@ static void devres_log(struct device *dev, struct devres_node *node,
const char *op)
{
if (unlikely(log_devres))
- dev_err(dev, "DEVRES %3s %p %s (%zu bytes)\n",
+ dev_info(dev, "DEVRES %3s %p %s (%zu bytes)\n",
I suspect the release processes while unloading the snd_soc_rt5682_sdw module. Although the oops messages happened while unloading the snd_soc_rt5682_i2c module, the snd_soc_rt5682_sdw module was done before.
This could be rather verbose. @fredoh9 maybe we can enable devres dynamic debug by default, but only change the kernel parameter before doing the kmod-load-unload tests (using sysfs instead of the modprobe.d options).
Maybe @andy-shev knows why a dev_err() is used, it's been that way since the beginning but it doesn't seem like a error at all.
Without changing anything, I couldn't reproduce the problem. Even sof ci also find once with above, no more report after that. I tried same with above test, linux pr 2989 + sof pr 4386. I even tried with same build binaries with the test. No luck yet. I tried to test with -l 500 also.
I can submit a PR to enable devres log before starting kmod test, but we need to ignore "DEVRES*". When kernel oops happens, then we will get all the DEVRES logs from dmesg.
BTW, we have 4 different REVRES op
- ADD
- REP : replace
- REM : remove
- REL : release
Very verbose, when the issue happens, we need to still go through each line.
I agree with @fredoh9 this can be a painful problem to solve.
@shumingfan @oder-chiou @bardliao I would bet the problem came from the fact that devm_ is used for the clock management inside the component .probe callback. This is in theory not very good. devm_ associates the resources to a device, so should be used in the I2C probe, not the component probe. I was looking at rt5682_probe and could not figure out why in the SoundWire case we need to call the wait_for_completion_timeout() - it's not done in any other codec driver - and in the I2C case is there any merit in getting/registering the clocks when the card is created, instead of when the I2C driver is probed?
Wondering if this code could be moved in the regular I2C probe and SoundWire init_io. Comments welcome.
To enable DEVRES for kmod test, https://github.com/thesofproject/sof-test/pull/719
I was looking at rt5682_probe and could not figure out why in the SoundWire case we need to call the wait_for_completion_timeout() - it's not done in any other codec driver - and in the I2C case is there any merit in getting/registering the clocks when the card is created, instead of when the I2C driver is probed?
@plbossart Sometimes we encounter that the component probe came earlier than io_init, so the driver waits for SdW slave initialization.
Wondering if this code could be moved in the regular I2C probe and SoundWire init_io. Comments welcome.
I understand your point. That's why I think the DAI clk is released by snd_soc_rt5682_sdw. But we don't have the DEVRES logs to confirm that. I feel ok move DAI clock registration to the I2C probe and SdW io_init doesn't need DAI clock.
@plbossart [ 5994.688772] kernel: Hardware name: Google Helios/Helios, BIOS 01/21/2020 [ 5994.688775] kernel: RIP: 0010:clk_unregister+0x189/0x240 It seems the resource was released before when the snd_soc_rt5682_i2c unloads.
Could you enable devres_log to get more information before the script test?
# cd /sys/module/devres/parameters/ # echo 1 > log
Please, start using devres trace events instead. Relevant change: https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core.git/commit/?h=driver-core-next&id=09705dcb63d269000595284b5dd7f5c938d647b9
@andy-shev not easy, all the CI is based on the dmesg log, that's where we saw the kernel oops. Adding a trace mechanism and correlating it with an oops isn't something I know how to do. Pointers welcome.
@andy-shev not easy, all the CI is based on the dmesg log, that's where we saw the kernel oops. Adding a trace mechanism and correlating it with an oops isn't something I know how to do. Pointers welcome.
There is a knob to print traces in dmesg in case of Oops (https://elixir.bootlin.com/linux/latest/source/kernel/trace/trace.c#L119). Also you may enable trace events in the kernel command line (https://www.kernel.org/doc/html/latest/trace/events.html#boot-option).
(A bit outdated, e.g. tracefs vs. debugfs) overview https://events.static.linuxfound.org/sites/events/files/slides/praesentation_0.pdf. But better to see the official in-kernel documentation.
Kernel oops on Helios Chromebook in inner daily 6090 too, when check-kmod-load-unload [dmesg]
[ 7171.670587] kernel: BUG: kernel NULL pointer dereference, address: 000000000000000a
[ 7171.670610] kernel: #PF: supervisor write access in kernel mode
[ 7171.670626] kernel: #PF: error_code(0x0002) - not-present page
[ 7171.670643] kernel: PGD 0 P4D 0
[ 7171.670671] kernel: Oops: 0002 [#1] SMP NOPTI
[ 7171.670695] kernel: CPU: 1 PID: 83006 Comm: systemd-udevd Not tainted 5.14.0-rc6-daily-default-20210825 #1
[ 7171.670722] kernel: Hardware name: Google Helios/Helios, BIOS 01/21/2020
[ 7171.670735] kernel: RIP: 0010:__clk_register+0x490/0x7e0
[ 7171.670772] kernel: Code: 89 c4 49 89 47 30 49 8d 87 b8 00 00 00 4d 85 e4 0f 84 7a 02 00 00 49 8b 94 24 b0 00 00 00 49 89 97 b8 00 00 00 48 85 d2 74 04 <48> 89 42 08 49 89 84 24 b0 00 00 00 49 8d 84 24 b0 00 00 00 49 89
[ 7171.670796] kernel: RSP: 0018:ffffa1d08158f958 EFLAGS: 00010202
[ 7171.670815] kernel: RAX: ffff9de2d68a94b8 RBX: ffff9de2f0f30bc8 RCX: 0000000071426b9b
[ 7171.670831] kernel: RDX: 0000000000000002 RSI: 0000000000000000 RDI: ffff9de2d68a9400
[ 7171.670836] kernel: RBP: 0000000000000001 R08: 00000000ffffffff R09: 0000000000000000
[ 7171.670836] kernel: R10: 0000000000000000 R11: ffff9de3e705786f R12: ffff9de2e8aec1a8
[ 7171.670836] kernel: R13: ffff9de2e7057a20 R14: ffff9de2ec0f29e8 R15: ffff9de2d68a9400
[ 7171.670836] kernel: FS: 00007f28a2c37880(0000) GS:ffff9de416000000(0000) knlGS:0000000000000000
[ 7171.670836] kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 7171.670836] kernel: CR2: 000000000000000a CR3: 0000000116b10004 CR4: 00000000003706e0
[ 7171.670836] kernel: Call Trace:
[ 7171.670836] kernel: ? clk_hw_unregister+0x10/0x10
[ 7171.670836] kernel: clk_hw_register+0x19/0x40
[ 7171.670836] kernel: devm_clk_hw_register+0x41/0x80
[ 7171.670836] kernel: rt5682_probe+0x148/0x1f0 [snd_soc_rt5682]
[ 7171.670836] kernel: snd_soc_component_probe+0x19/0x40 [snd_soc_core]
[ 7171.670836] kernel: soc_probe_component+0x1cb/0x300 [snd_soc_core]
[ 7171.670836] kernel: snd_soc_bind_card+0x506/0xcf0 [snd_soc_core]
[ 7171.670836] kernel: ? is_module_address+0xc/0x20
[ 7171.670836] kernel: ? lockdep_init_map_type+0x51/0x210
[ 7171.670836] kernel: ? __raw_spin_lock_init+0x36/0x60
[ 7171.670836] kernel: devm_snd_soc_register_card+0x3e/0x80 [snd_soc_core]
[ 7171.670836] kernel: platform_probe+0x4f/0xb0
[ 7171.670836] kernel: really_probe+0x1a6/0x3a0
[ 7171.670836] kernel: __driver_probe_device+0xf9/0x170
[ 7171.670836] kernel: driver_probe_device+0x19/0x90
[ 7171.670836] kernel: __driver_attach+0x99/0x170
[ 7171.670836] kernel: ? __device_attach_driver+0xd0/0xd0
[ 7171.670836] kernel: ? __device_attach_driver+0xd0/0xd0
[ 7171.670836] kernel: bus_for_each_dev+0x73/0xc0
[ 7171.670836] kernel: bus_add_driver+0x14b/0x1f0
[ 7171.670836] kernel: driver_register+0x67/0xb0
[ 7171.670836] kernel: ? 0xffffffffc02e6000
[ 7171.670836] kernel: do_one_initcall+0x64/0x220
[ 7171.670836] kernel: ? __cond_resched+0x10/0x20
[ 7171.670836] kernel: ? kmem_cache_alloc_trace+0x4a/0x1f0
[ 7171.670836] kernel: do_init_module+0x56/0x200
[ 7171.670836] kernel: load_module+0x258e/0x28f0
[ 7171.670836] kernel: ? __do_sys_finit_module+0xae/0x110
[ 7171.670836] kernel: __do_sys_finit_module+0xae/0x110
[ 7171.670836] kernel: do_syscall_64+0x38/0x90
[ 7171.670836] kernel: entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 7171.670836] kernel: RIP: 0033:0x7f28a31b789d
@XiaoyunWu6666 Do you have the method to duplicate this issue again? I submitted PR #3130 that tries to fix this issue. Could you test with this PR?
There is no magic test command to reproduce this for 100%. We can try with high iteration but even test with 100 loop may not reproduce the problem.
TPLG=sof-cml-rt1011-rt5682.tplg ~/sof-test/test-case/check-kmod-load-unload.sh -l 100
@XiaoyunWu6666 @fredoh9 I updated PR #3130 which passed the CI test with the CML_HEL_RT5682 machine. https://sof-ci.01.org/linuxpr/PR3130/build6398/devicetest/
Could you try to test with this PR?
@XiaoyunWu6666 @fredoh9 I updated PR #3130 which passed the CI test with the CML_HEL_RT5682 machine. https://sof-ci.01.org/linuxpr/PR3130/build6398/devicetest/
Could you try to test with this PR?
Ran it in inner result 6926 , with kmod-load-unload for 50 rounds on multiple CML platforms. The test showed all PASS ... but not sure whether PR 3130 does fix it or not , because this is an issue with rather low reproduction rate
@keqiaozhang can you check if we've seen those issues on Helios Chromebook recently? it's almost been a year.
@plbossart I don't think this is still reproducible, I don't even see this issue during the year of 2022 till now. I am OK to close it.
I haven't gotten any bugs my way recently on this either
let's close. There's still an issue with DA7219 but separate problem