linux
linux copied to clipboard
[BUG][APL] Kernel RIP dev_watchdog NETDEV WATCHDOG
APL_UP2_PCM512X has kernel RIP while testing check-playback-100sec. Internal test: 16114?model=APL_UP2_PCM512X&testcase=check-playback-100sec
- Kernel Branch: topic/sof-dev
- Kernel Commit: f048bf5646cc
- SOF Branch: stable-v2.2
- SOF Commit: afe466da0238
Note, this device has two internal ethernet ports, one of them is being used.
- enp2s0 : this is being used
- enp3s0 : not being used
[ 705.134218] kernel: ------------[ cut here ]------------
[ 705.134282] kernel: NETDEV WATCHDOG: enp2s0 (r8169): transmit queue 0 timed out
[ 705.134354] kernel: WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:525 dev_watchdog+0x1f5/0x210
[ 705.134385] kernel: Modules linked in: snd_usb_audio snd_sof_pci_intel_mtl snd_sof_pci_intel_tgl snd_sof_pci_intel_icl snd_sof_pci_intel_cnl snd_soc_sst_sof_pcm512x snd_sof_ipc_msg_injector snd_sof_ipc_flood_test snd_sof_probes snd_soc_intel_hda_dsp_common snd_hda_codec_hdmi snd_soc_dmic snd_sof_pci_intel_apl snd_sof_pci_intel_skl snd_sof_intel_hda_common soundwire_intel soundwire_generic_allocation soundwire_cadence snd_sof_intel_hda snd_soc_hdac_hda snd_hda_ext_core snd_hda_codec snd_hwdep snd_hda_core snd_sof_pci_intel_tng snd_sof_pci snd_sof_acpi_intel_bdw snd_sof_acpi_intel_byt snd_sof_intel_atom snd_sof_xtensa_dsp snd_soc_acpi_intel_match snd_soc_acpi snd_sof_acpi snd_sof snd_sof_utils snd_intel_dspcfg snd_intel_sdw_acpi snd_soc_es8326 snd_soc_es8316 snd_soc_max98390 snd_soc_max98373_i2c snd_soc_max98373_sdw snd_soc_max98373 snd_soc_max98357a snd_soc_ts3a227e snd_soc_max98090 snd_soc_rt5682_sdw snd_soc_rt5682_i2c snd_soc_rt5682 snd_soc_rt5677 snd_soc_rt5677_spi snd_soc_rt5670
[ 705.134841] kernel: snd_soc_rt5660 snd_soc_rt5651 snd_soc_rt5645 snd_soc_rt5640 snd_soc_rt1011 snd_soc_sdw_mockup snd_soc_rt715_sdca snd_soc_rt1316_sdw snd_soc_rt711_sdca regmap_sdw_mbq snd_soc_rt715 snd_soc_rt1308_sdw snd_soc_rt1308 snd_soc_rl6231 snd_soc_rt711 snd_soc_rt700 regmap_sdw soundwire_bus snd_soc_rt298 snd_soc_rt286 snd_soc_rt274 snd_soc_rl6347a snd_soc_wm8804_i2c snd_soc_wm8804 snd_soc_pcm512x_i2c snd_soc_pcm512x snd_soc_da7219 snd_soc_da7213 snd_soc_core snd_compress snd_pcm snd_ctl_led ledtrig_audio snd_soc_sst_ipc x86_pkg_temp_thermal snd_soc_sst_dsp snd_usbmidi_lib i915 regmap_i2c snd_seq_midi i2c_algo_bit drm_buddy snd_seq_midi_event drm_display_helper snd_rawmidi drm_kms_helper cfbfillrect syscopyarea cfbimgblt sysfillrect sysimgblt fb_sys_fops mei_me cfbcopyarea snd_seq ttm mei snd_seq_device snd_timer snd soundcore intel_pmc_core squashfs fuse drm drm_panel_orientation_quirks efivarfs spi_pxa2xx_platform mmc_block sdhci_pci cqhci sdhci xhci_pci intel_lpss_pci intel_lpss
[ 705.135483] kernel: idma64 mfd_core xhci_hcd [last unloaded: snd_pcm]
[ 705.135529] kernel: CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.0.0-rc5-daily-default-20221011-0-gf048bf5646cc #dev
[ 705.135546] kernel: Hardware name: AAEON UP-APL01/UP-APL01, BIOS UPA1AM40 08/06/2018
[ 705.135558] kernel: RIP: 0010:dev_watchdog+0x1f5/0x210
[ 705.135578] kernel: Code: ff e9 4b ff ff ff 48 89 ef c6 05 18 8a 20 01 01 e8 60 2d fb ff 44 89 e9 48 89 ee 48 c7 c7 28 bb ae a7 48 89 c2 e8 d4 29 2c 00 <0f> 0b e9 57 ff ff ff e8 df 23 4e ff e9 44 ff ff ff 66 2e 0f 1f 84
[ 705.135594] kernel: RSP: 0018:ffff9ee740003e80 EFLAGS: 00010282
[ 705.135615] kernel: RAX: 0000000000000000 RBX: ffff919984f0c4d0 RCX: 0000000000000000
[ 705.135627] kernel: RDX: 0000000000000103 RSI: ffffffffa7a16269 RDI: 00000000ffffffff
[ 705.135639] kernel: RBP: ffff919984f0c000 R08: 0000000000000000 R09: ffff9ee740003d20
[ 705.135650] kernel: R10: 000000000008f0f0 R11: 000000000008f168 R12: ffff919984f0c3e0
[ 705.135662] kernel: R13: 0000000000000000 R14: ffff919984f0c4d0 R15: ffffffffa6d20590
[ 705.135674] kernel: FS: 0000000000000000(0000) GS:ffff9199f0800000(0000) knlGS:0000000000000000
[ 705.135687] kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 705.135699] kernel: CR2: 0000000001953060 CR3: 0000000025c24000 CR4: 00000000003506f0
[ 705.135711] kernel: Call Trace:
[ 705.135724] kernel:
[ 705.135744] kernel: ? mq_change_real_num_tx+0xd0/0xd0
[ 705.135766] kernel: call_timer_fn+0xad/0x250
[ 705.135806] kernel: run_timer_softirq+0x4a3/0x550
[ 705.135833] kernel: ? recalibrate_cpu_khz+0x10/0x10
[ 705.135850] kernel: ? ktime_get+0x77/0xe0
[ 705.135869] kernel: ? setup_local_APIC+0x340/0x340
[ 705.135900] kernel: __do_softirq+0x11e/0x33c
[ 705.135936] kernel: irq_exit_rcu+0x91/0xc0
[ 705.135955] kernel: sysvec_apic_timer_interrupt+0x8e/0xc0
[ 705.135975] kernel:
[ 705.135985] kernel:
[ 705.135998] kernel: asm_sysvec_apic_timer_interrupt+0x16/0x20
[ 705.136015] kernel: RIP: 0010:cpuidle_enter_state+0x117/0x4b0
[ 705.136032] kernel: Code: 3c 59 0f 84 4c 03 00 00 31 ff e8 54 24 67 ff 45 84 ff 74 12 9c 58 f6 c4 02 0f 85 7f 03 00 00 31 ff e8 0d b7 6d ff fb 45 85 f6 <0f> 88 d4 01 00 00 49 63 c6 4c 2b 2c 24 48 8d 14 40 48 8d 14 90 49
[ 705.136046] kernel: RSP: 0018:ffffffffa7c03e30 EFLAGS: 00000206
[ 705.136066] kernel: RAX: ffff9199f0800000 RBX: 0000000000000003 RCX: 000000000000001f
[ 705.136077] kernel: RDX: 0000000000000000 RSI: ffffffffa7a16269 RDI: ffffffffa7a2298c
[ 705.136088] kernel: RBP: ffff919981e2c800 R08: 000000a42d453c2b R09: 0000000000000002
[ 705.136099] kernel: R10: 0000000000000001 R11: 0000000000000001 R12: ffffffffa7e60ee0
[ 705.136110] kernel: R13: 000000a42d453c2b R14: 0000000000000003 R15: 0000000000000000
[ 705.136172] kernel: cpuidle_enter+0x29/0x40
[ 705.136193] kernel: do_idle+0x1e9/0x280
[ 705.136228] kernel: cpu_startup_entry+0x19/0x20
[ 705.136246] kernel: rest_init+0x128/0x130
[ 705.136265] kernel: arch_call_rest_init+0xa/0x14
[ 705.136289] kernel: start_kernel+0x683/0x6ac
[ 705.136321] kernel: secondary_startup_64_no_verify+0xe0/0xeb
[ 705.136376] kernel:
[ 705.136387] kernel: ---[ end trace 0000000000000000 ]---
To reproduce TPLG=/lib/firmware/intel/sof-tplg/sof-apl-pcm512x.tplg MODEL=APL_UP2_PCM512X ~/sof-test/test-case/check-playback.sh -d 100 -l 1 -r 1
Reproduction Rate 1/50, rare case
2022-10-12 00:44:56 UTC [ERROR] Caught kernel log error
===========================>>
[ 705.152491] kernel: r8169 0000:02:00.0 enp2s0: rtl_txcfg_empty_cond == 0 (loop: 42, delay: 100).
[ 705.162860] kernel: r8169 0000:02:00.0 enp2s0: rtl_rxtx_empty_cond == 0 (loop: 42, delay: 100).
[ 716.927143] kernel: r8169 0000:02:00.0 enp2s0: rtl_txcfg_empty_cond == 0 (loop: 42, delay: 100).
[ 716.940532] kernel: r8169 0000:02:00.0 enp2s0: rtl_rxtx_empty_cond == 0 (loop: 42, delay: 100).
[ 733.312271] kernel: r8169 0000:02:00.0 enp2s0: rtl_txcfg_empty_cond == 0 (loop: 42, delay: 100).
[ 733.326199] kernel: r8169 0000:02:00.0 enp2s0: rtl_rxtx_empty_cond == 0 (loop: 42, delay: 100).
[ 744.062380] kernel: r8169 0000:02:00.0 enp2s0: rtl_txcfg_empty_cond == 0 (loop: 42, delay: 100).
[ 744.073789] kernel: r8169 0000:02:00.0 enp2s0: rtl_rxtx_empty_cond == 0 (loop: 42, delay: 100).
[ 760.961879] kernel: r8169 0000:02:00.0 enp2s0: rtl_txcfg_empty_cond == 0 (loop: 42, delay: 100).
[ 760.974064] kernel: r8169 0000:02:00.0 enp2s0: rtl_rxtx_empty_cond == 0 (loop: 42, delay: 100).
[ 777.347083] kernel: r8169 0000:02:00.0 enp2s0: rtl_txcfg_empty_cond == 0 (loop: 42, delay: 100).
[ 777.359609] kernel: r8169 0000:02:00.0 enp2s0: rtl_rxtx_empty_cond == 0 (loop: 42, delay: 100).
[ 793.219988] kernel: r8169 0000:02:00.0 enp2s0: rtl_txcfg_empty_cond == 0 (loop: 42, delay: 100).
[ 793.232147] kernel: r8169 0000:02:00.0 enp2s0: rtl_rxtx_empty_cond == 0 (loop: 42, delay: 100).
[ 804.996403] kernel: r8169 0000:02:00.0 enp2s0: rtl_txcfg_empty_cond == 0 (loop: 42, delay: 100).
[ 805.008387] kernel: r8169 0000:02:00.0 enp2s0: rtl_rxtx_empty_cond == 0 (loop: 42, delay: 100).
[ 827.013746] kernel: r8169 0000:02:00.0 enp2s0: rtl_txcfg_empty_cond == 0 (loop: 42, delay: 100).
[ 827.026510] kernel: r8169 0000:02:00.0 enp2s0: rtl_rxtx_empty_cond == 0 (loop: 42, delay: 100).
[ 843.910273] kernel: r8169 0000:02:00.0 enp2s0: rtl_txcfg_empty_cond == 0 (loop: 42, delay: 100).
[ 843.921952] kernel: r8169 0000:02:00.0 enp2s0: rtl_rxtx_empty_cond == 0 (loop: 42, delay: 100).
[ 854.150517] kernel: r8169 0000:02:00.0 enp2s0: rtl_txcfg_empty_cond == 0 (loop: 42, delay: 100).
[ 854.160797] kernel: r8169 0000:02:00.0 enp2s0: rtl_rxtx_empty_cond == 0 (loop: 42, delay: 100).
[ 865.928269] kernel: r8169 0000:02:00.0 enp2s0: rtl_txcfg_empty_cond == 0 (loop: 42, delay: 100).
[ 865.940803] kernel: r8169 0000:02:00.0 enp2s0: rtl_rxtx_empty_cond == 0 (loop: 42, delay: 100).
<<===========================
Attachments: dmesg-apl.txt etrace-apl.txt slogger-apl.txt
After scanning journalctl, the same watchdog RIP happened twice in a week.
...
-- Boot 1905054c837e4d4e998bb2879d9d6471 --
Oct 08 01:22:55 jf-apl-up2-pcm512x-1 kernel: NETDEV WATCHDOG: enp2s0 (r8169): transmit queue 0 timed out
-- Boot c2f9f3f244f74d779e7b2a908ee2e0f9 --
-- Boot a5c1df39b5184abdb263050946bd1e40 --
-- Boot cc16a276fdff43298d2bd6557941a72e --
-- Boot 492d759c683f43d4ac05f919a804928f --
-- Boot 25a4b38d70f44a56bb6076e3f37276d8 --
-- Boot e36bd3e2f8d946b7a6e55e95a9e69b1d --
-- Boot ab6435dcd30b4a53a70c0d86d9a89996 --
-- Boot 656e21b8310a46a1b9bfe22ce44cadd8 --
-- Boot bd4e457800cf444abd873204c6529284 --
-- Boot 82aff84c372747c3b048234204795c4a --
-- Boot 24e26d4984ac4778b6e0525c76991045 --
-- Boot ff2100291c504e1c978e17829cbaed60 --
-- Boot 0cfb312d63394ffb961d402ab9f8dc41 --
-- Boot 62d41f39bc6d4eb7bad32f70f7f8c111 --
Oct 12 00:42:15 jf-apl-up2-pcm512x-1 kernel: NETDEV WATCHDOG: enp2s0 (r8169): transmit queue 0 timed out
-- Boot f3a02322360b4996b0326b13ad7d08b0 --
no one has bandwidth to look into this, closing as "won't fix"