[BUG] kernel call trace error when testing kmod reload after playback on GLK chromebook
Describe the bug This is the first time we have observed this issue. no not sure about the preproduction rate. no etrace available. only kernel log is saved. CI daily report ID 1547.
To Reproduce sof-test/test-case/check-kmod-load-unload-after-playback.sh -l 15
Reproduction Rate Only reproduce one in CI daily test. will trigger the test manually next to figure it out.
Platform GLK Chromebook
Kernel log
kernel: [10102.726814] Hardware name: Google Bobba/Bobba, BIOS Google_Bobba.11825.0.2019_03_06_2015 03/06/2019
kernel: [10102.726817] RIP: 0010:dapm_find_widget+0x71/0xd0 [snd_soc_core]
kernel: [10102.726848] Code: 00 48 89 e7 e8 40 73 83 f6 49 89 e4 49 8b 6d 18 45 31 f6 48 8b 85 00 04 00 00 48 81 c5 00 04 00 00 48 39 c5 48 8d 58 e8 74 26 <48> 8b 7b 08 4c 89 e6 e8 d3 21 83 f6 85 c0 75 09 4c 39 6b 28 74 1c
kernel: [10102.726852] RSP: 0018:ffffb09300467d80 EFLAGS: 00010202
kernel: [10102.726856] RAX: dead000000000100 RBX: dead0000000000e8 RCX: 0000000000000000
kernel: [10102.726859] RDX: 0000000000000001 RSI: ffffffffc05f3a5f RDI: ffffa04ad0a14141
kernel: [10102.726862] RBP: ffffffffc0cb64e0 R08: 0000000000000001 R09: 0000000000000001
kernel: [10102.726864] R10: ffffb09300467e38 R11: 0000000000000000 R12: ffffffffc05f3a5e
kernel: [10102.726867] R13: ffffa04ab4d9ad68 R14: 0000000000000000 R15: 0000000000000001
kernel: [10102.726870] FS: 0000000000000000(0000) GS:ffffa04afbc00000(0000) knlGS:0000000000000000
kernel: [10102.726873] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: [10102.726876] CR2: 00005587fa513e70 CR3: 000000011b6fe000 CR4: 0000000000350ee0
kernel: [10102.726878] Call Trace:
kernel: [10102.726885] ? __mutex_unlock_slowpath+0x3e/0x2a0
kernel: [10102.726894] snd_soc_dapm_set_pin+0x18/0xa0 [snd_soc_core]
kernel: [10102.726916] snd_soc_dapm_disable_pin+0x2a/0x50 [snd_soc_core]
kernel: [10102.726938] da7219_aad_irq_thread+0x22f/0x270 [snd_soc_da7219]
kernel: [10102.726946] ? irq_forced_thread_fn+0x70/0x70
kernel: [10102.726951] irq_thread_fn+0x1c/0x60
kernel: [10102.726955] ? irq_thread+0x9c/0x1a0
kernel: [10102.726958] irq_thread+0x102/0x1a0
kernel: [10102.726961] ? wake_threads_waitq+0x30/0x30
kernel: [10102.726965] ? irq_thread_dtor+0x90/0x90
kernel: [10102.726968] kthread+0x134/0x150
kernel: [10102.726973] ? kthread_park+0x80/0x80
kernel: [10102.726976] ret_from_fork+0x1f/0x30
kernel: [10102.726983] Modules linked in: snd_soc_sst_bxt_da7219_max98357a snd_soc_hdac_hdmi snd_hda_codec_hdmi snd_soc_dmic snd_sof_pci(-) snd_sof_intel_hda_common snd_sof_intel_hda snd_soc_hdac_hda snd_hda_ext_core snd_hda_codec snd_hwdep snd_hda_core snd_sof_acpi snd_sof_intel_byt snd_sof_intel_bdw snd_sof_intel_ipc snd_sof snd_sof_xtensa_dsp snd_soc_acpi_intel_match snd_soc_acpi snd_intel_dspcfg soundwire_intel soundwire_generic_allocation soundwire_cadence snd_soc_rt1011 snd_soc_rt715 snd_soc_rt1308_sdw snd_soc_rt1308 snd_soc_rt711 snd_soc_rt700 snd_soc_max98373_sdw snd_soc_max98373_i2c snd_soc_max98373 snd_soc_max98090 snd_soc_max98357a snd_soc_wm8804_i2c snd_soc_wm8804 snd_soc_pcm512x_i2c snd_soc_pcm512x snd_soc_rt5682_sdw regmap_sdw soundwire_bus snd_soc_rt5682_i2c snd_soc_rt5682 snd_soc_rt5677 snd_soc_rt5677_spi snd_soc_rt5670 snd_soc_rt5660 snd_soc_rt5651 snd_soc_rt5645 snd_soc_rt5640 snd_soc_rl6231 snd_soc_rt298 snd_soc_rt286 snd_soc_rl6347a snd_soc_da7219 snd_soc_da7213 snd_soc_core
kernel: [10102.727054] snd_compress snd_pcm asix usbnet x86_pkg_temp_thermal intel_powerclamp ledtrig_audio snd_soc_sst_ipc snd_soc_sst_dsp i915 i2c_algo_bit drm_kms_helper syscopyarea processor_thermal_device sysfillrect sysimgblt processor_thermal_rfim fb_sys_fops processor_thermal_mbox intel_soc_dts_iosf regmap_i2c snd_usbmidi_lib elan_i2c int3403_thermal int340x_thermal_zone int3400_thermal acpi_thermal_rel snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq snd_seq_device snd_timer snd soundcore drm efivarfs mmc_block spi_pxa2xx_platform xhci_pci sdhci_pci cqhci intel_lpss_pci sdhci xhci_hcd intel_lpss mfd_core i2c_hid [last unloaded: snd_usb_audio]
kernel: [10102.727140] ---[ end trace 13419a076e17e424 ]---
kernel: [10102.727143] RIP: 0010:dapm_find_widget+0x71/0xd0 [snd_soc_core]
kernel: [10102.727165] Code: 00 48 89 e7 e8 40 73 83 f6 49 89 e4 49 8b 6d 18 45 31 f6 48 8b 85 00 04 00 00 48 81 c5 00 04 00 00 48 39 c5 48 8d 58 e8 74 26 <48> 8b 7b 08 4c 89 e6 e8 d3 21 83 f6 85 c0 75 09 4c 39 6b 28 74 1c
kernel: [10102.727169] RSP: 0018:ffffb09300467d80 EFLAGS: 00010202
kernel: [10102.727172] RAX: dead000000000100 RBX: dead0000000000e8 RCX: 0000000000000000
kernel: [10102.727174] RDX: 0000000000000001 RSI: ffffffffc05f3a5f RDI: ffffa04ad0a14141
kernel: [10102.727177] RBP: ffffffffc0cb64e0 R08: 0000000000000001 R09: 0000000000000001
kernel: [10102.727180] R10: ffffb09300467e38 R11: 0000000000000000 R12: ffffffffc05f3a5e
kernel: [10102.727182] R13: ffffa04ab4d9ad68 R14: 0000000000000000 R15: 0000000000000001
kernel: [10102.727185] FS: 0000000000000000(0000) GS:ffffa04afbc00000(0000) knlGS:0000000000000000
kernel: [10102.727188] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: [10102.727190] CR2: 00005587fa513e70 CR3: 000000011b6fe000 CR4: 0000000000350ee0
kernel: [10102.727205] BUG: kernel NULL pointer dereference, address: 0000000000000000
kernel: [10102.727208] #PF: supervisor instruction fetch in kernel mode
kernel: [10102.727211] #PF: error_code(0x0010) - not-present page
kernel: [10102.727214] PGD 0 P4D 0
kernel: [10102.727218] Oops: 0010 [#2] SMP PTI
kernel: [10102.727221] CPU: 1 PID: 156379 Comm: irq/96-da7219-a Tainted: G D 5.11.0-rc2-daily-default-20210111 #3
kernel: [10102.727225] Hardware name: Google Bobba/Bobba, BIOS Google_Bobba.11825.0.2019_03_06_2015 03/06/2019
kernel: [10102.727227] RIP: 0010:0x0
kernel: [10102.727232] Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6.
kernel: [10102.727234] RSP: 0018:ffffb09300467ea8 EFLAGS: 00010286
kernel: [10102.727237] RAX: 0000000000000000 RBX: ffffb09300467ec8 RCX: 0000000000000000
kernel: [10102.727240] RDX: ffffffffb6a8dcd7 RSI: ffffffffb6a8dcdf RDI: ffffb09300467ec8
kernel: [10102.727243] RBP: ffffa04aa5a95040 R08: 0000000000000000 R09: 0000000000000001
kernel: [10102.727245] R10: 0000000000000000 R11: 0000000000000000 R12: ffffa04aa5a95040
kernel: [10102.727247] R13: ffffffffb7d24238 R14: ffffa04aa5a95890 R15: ffffffffb8a05bb0
kernel: [10102.727250] FS: 0000000000000000(0000) GS:ffffa04afbc00000(0000) knlGS:0000000000000000
kernel: [10102.727253] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: [10102.727255] CR2: ffffffffffffffd6 CR3: 000000011b6fe000 CR4: 0000000000350ee0
kernel: [10102.727258] Call Trace:
kernel: [10102.727260] task_work_run+0x6e/0xc0
kernel: [10102.727265] do_exit+0x3f3/0xc80
kernel: [10102.727271] ? irq_thread_dtor+0x90/0x90
kernel: [10102.727274] ? kthread+0x134/0x150
kernel: [10102.727278] rewind_stack_do_exit+0x17/0x20
kernel: [10102.727281] RIP: 0000:0x0
kernel: [10102.727284] Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6.
kernel: [10102.727287] RSP: 0000:0000000000000000 EFLAGS: 00000000 ORIG_RAX: 0000000000000000
kernel: [10102.727290] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
kernel: [10102.727292] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
kernel: [10102.727295] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
kernel: [10102.727297] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
kernel: [10102.727299] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
kernel: [10102.727303] Modules linked in: snd_soc_sst_bxt_da7219_max98357a snd_soc_hdac_hdmi snd_hda_codec_hdmi snd_soc_dmic snd_sof_pci(-) snd_sof_intel_hda_common snd_sof_intel_hda snd_soc_hdac_hda snd_hda_ext_core snd_hda_codec snd_hwdep snd_hda_core snd_sof_acpi snd_sof_intel_byt snd_sof_intel_bdw snd_sof_intel_ipc snd_sof snd_sof_xtensa_dsp snd_soc_acpi_intel_match snd_soc_acpi snd_intel_dspcfg soundwire_intel soundwire_generic_allocation soundwire_cadence snd_soc_rt1011 snd_soc_rt715 snd_soc_rt1308_sdw snd_soc_rt1308 snd_soc_rt711 snd_soc_rt700 snd_soc_max98373_sdw snd_soc_max98373_i2c snd_soc_max98373 snd_soc_max98090 snd_soc_max98357a snd_soc_wm8804_i2c snd_soc_wm8804 snd_soc_pcm512x_i2c snd_soc_pcm512x snd_soc_rt5682_sdw regmap_sdw soundwire_bus snd_soc_rt5682_i2c snd_soc_rt5682 snd_soc_rt5677 snd_soc_rt5677_spi snd_soc_rt5670 snd_soc_rt5660 snd_soc_rt5651 snd_soc_rt5645 snd_soc_rt5640 snd_soc_rl6231 snd_soc_rt298 snd_soc_rt286 snd_soc_rl6347a snd_soc_da7219 snd_soc_da7213 snd_soc_core
kernel: [10102.727357] snd_compress snd_pcm asix usbnet x86_pkg_temp_thermal intel_powerclamp ledtrig_audio snd_soc_sst_ipc snd_soc_sst_dsp i915 i2c_algo_bit drm_kms_helper syscopyarea processor_thermal_device sysfillrect sysimgblt processor_thermal_rfim fb_sys_fops processor_thermal_mbox intel_soc_dts_iosf regmap_i2c snd_usbmidi_lib elan_i2c int3403_thermal int340x_thermal_zone int3400_thermal acpi_thermal_rel snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq snd_seq_device snd_timer snd soundcore drm efivarfs mmc_block spi_pxa2xx_platform xhci_pci sdhci_pci cqhci intel_lpss_pci sdhci xhci_hcd intel_lpss mfd_core i2c_hid [last unloaded: snd_usb_audio]
kernel: [10102.727407] CR2: 0000000000000000
kernel: [10102.727409] ---[ end trace 13419a076e17e425 ]---
kernel: [10102.727411] RIP: 0010:dapm_find_widget+0x71/0xd0 [snd_soc_core]
kernel: [10102.727432] Code: 00 48 89 e7 e8 40 73 83 f6 49 89 e4 49 8b 6d 18 45 31 f6 48 8b 85 00 04 00 00 48 81 c5 00 04 00 00 48 39 c5 48 8d 58 e8 74 26 <48> 8b 7b 08 4c 89 e6 e8 d3 21 83 f6 85 c0 75 09 4c 39 6b 28 74 1c
kernel: [10102.727436] RSP: 0018:ffffb09300467d80 EFLAGS: 00010202
kernel: [10102.727439] RAX: dead000000000100 RBX: dead0000000000e8 RCX: 0000000000000000
kernel: [10102.727442] RDX: 0000000000000001 RSI: ffffffffc05f3a5f RDI: ffffa04ad0a14141
kernel: [10102.727444] RBP: ffffffffc0cb64e0 R08: 0000000000000001 R09: 0000000000000001
kernel: [10102.727446] R10: ffffb09300467e38 R11: 0000000000000000 R12: ffffffffc05f3a5e
kernel: [10102.727449] R13: ffffa04ab4d9ad68 R14: 0000000000000000 R15: 0000000000000001
kernel: [10102.727451] FS: 0000000000000000(0000) GS:ffffa04afbc00000(0000) knlGS:0000000000000000
kernel: [10102.727454] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: [10102.727456] CR2: ffffffffffffffd6 CR3: 000000011b6fe000 CR4: 0000000000350ee0
kernel: [10102.727459] Fixing recursive fault but reboot is needed!
Set P3 and only reproduced once.
Caught in CI today: https://sof-ci.01.org/linuxpr/PR2747/build5278/devicetest/?model=GLK_BOB_DA7219&testcase=check-kmod-load-unload-after-playback Bumping to P2.
This looks like a codec issue @kv2019i by looking at the trace?
adding dmesg log here: dmesg.txt
possible fix at https://github.com/thesofproject/linux/pull/2749
Observed again today: https://sof-ci.01.org/softestpr/PR628/build606/devicetest/?model=GLK_BOB_DA7219&testcase=check-kmod-load-unload-after-playback
From test database, it is happening occasionally. I'm trying with my GLK Bobba, but interestingly I couldn't reproduce this issue.
- check-kmod-load-unload-with-playback.sh -l 50
- check-kmod-load-unload.sh -l 50
- check-kmod-load-unload-with-playback.sh -l 100
I'm using today's latest tip.
"kernelBranch" : "topic/sof-dev",
"kernelCommit" : "82c1e1f8",
"sofBranch" : "master",
"sofCommit" : "fa35ee00f29c"
It's indeed very very hard to reproduce, I ran 1000+ iterations of that test on a Bobba device and nothing bad happened.
looks like DA codec snd_soc_da7219 issue? mark it as codec and change it to P3.
Internal daily test 3669 found this NULL pointer dereference again. Kernel Commit: da1f1790
@fredoh9 is this test 3669 with KASAN enabled? I see an address "dead000000000100", that's weird otherwise...
@plbossart No this is default build. I'm running with KASAN enabled kernel build also, but so far working fine.
Error seen again in https://sof-ci.01.org/linuxpr/PR2978/build5863/devicetest/?model=GLK_BOB_DA7219&testcase=check-kmod-load-unload-after-playback
This means PR #2749 does not fully solve the problem. Gah, it seems we see this error once per month.
Same error is found in weekly stress test. Test result 5068. 5068?model=GLK_BOB_DA7219&testcase=check-kmod-load-unload-after-playback-25
- Kernel Branch: topic/sof-dev
- Kernel Commit: 6bd3089a
- SOF Branch: main
- SOF Commit: a439ea93f64f
Error seen again:https://sof-ci.sh.intel.com/#/result/planresultdetail/10920?model=GLK_BOB_DA7219&testcase=check-kmod-load-unload-25
This issue still occurs sporadically in CI, so I suggest to keep this issue open but with low priority.
The last time I saw this issue in CI was in December of 2022. Closing this bug.