linux icon indicating copy to clipboard operation
linux copied to clipboard

[BUG][APL] Kernel RIP dev_watchdog NETDEV WATCHDOG

Open fredoh9 opened this issue 3 years ago • 1 comments

APL_UP2_PCM512X has kernel RIP while testing check-playback-100sec. Internal test: 16114?model=APL_UP2_PCM512X&testcase=check-playback-100sec

  • Kernel Branch: topic/sof-dev
  • Kernel Commit: f048bf5646cc
  • SOF Branch: stable-v2.2
  • SOF Commit: afe466da0238

Note, this device has two internal ethernet ports, one of them is being used.

  • enp2s0 : this is being used
  • enp3s0 : not being used
[  705.134218] kernel: ------------[ cut here ]------------
[  705.134282] kernel: NETDEV WATCHDOG: enp2s0 (r8169): transmit queue 0 timed out
[  705.134354] kernel: WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:525 dev_watchdog+0x1f5/0x210
[  705.134385] kernel: Modules linked in: snd_usb_audio snd_sof_pci_intel_mtl snd_sof_pci_intel_tgl snd_sof_pci_intel_icl snd_sof_pci_intel_cnl snd_soc_sst_sof_pcm512x snd_sof_ipc_msg_injector snd_sof_ipc_flood_test snd_sof_probes snd_soc_intel_hda_dsp_common snd_hda_codec_hdmi snd_soc_dmic snd_sof_pci_intel_apl snd_sof_pci_intel_skl snd_sof_intel_hda_common soundwire_intel soundwire_generic_allocation soundwire_cadence snd_sof_intel_hda snd_soc_hdac_hda snd_hda_ext_core snd_hda_codec snd_hwdep snd_hda_core snd_sof_pci_intel_tng snd_sof_pci snd_sof_acpi_intel_bdw snd_sof_acpi_intel_byt snd_sof_intel_atom snd_sof_xtensa_dsp snd_soc_acpi_intel_match snd_soc_acpi snd_sof_acpi snd_sof snd_sof_utils snd_intel_dspcfg snd_intel_sdw_acpi snd_soc_es8326 snd_soc_es8316 snd_soc_max98390 snd_soc_max98373_i2c snd_soc_max98373_sdw snd_soc_max98373 snd_soc_max98357a snd_soc_ts3a227e snd_soc_max98090 snd_soc_rt5682_sdw snd_soc_rt5682_i2c snd_soc_rt5682 snd_soc_rt5677 snd_soc_rt5677_spi snd_soc_rt5670
[  705.134841] kernel:  snd_soc_rt5660 snd_soc_rt5651 snd_soc_rt5645 snd_soc_rt5640 snd_soc_rt1011 snd_soc_sdw_mockup snd_soc_rt715_sdca snd_soc_rt1316_sdw snd_soc_rt711_sdca regmap_sdw_mbq snd_soc_rt715 snd_soc_rt1308_sdw snd_soc_rt1308 snd_soc_rl6231 snd_soc_rt711 snd_soc_rt700 regmap_sdw soundwire_bus snd_soc_rt298 snd_soc_rt286 snd_soc_rt274 snd_soc_rl6347a snd_soc_wm8804_i2c snd_soc_wm8804 snd_soc_pcm512x_i2c snd_soc_pcm512x snd_soc_da7219 snd_soc_da7213 snd_soc_core snd_compress snd_pcm snd_ctl_led ledtrig_audio snd_soc_sst_ipc x86_pkg_temp_thermal snd_soc_sst_dsp snd_usbmidi_lib i915 regmap_i2c snd_seq_midi i2c_algo_bit drm_buddy snd_seq_midi_event drm_display_helper snd_rawmidi drm_kms_helper cfbfillrect syscopyarea cfbimgblt sysfillrect sysimgblt fb_sys_fops mei_me cfbcopyarea snd_seq ttm mei snd_seq_device snd_timer snd soundcore intel_pmc_core squashfs fuse drm drm_panel_orientation_quirks efivarfs spi_pxa2xx_platform mmc_block sdhci_pci cqhci sdhci xhci_pci intel_lpss_pci intel_lpss
[  705.135483] kernel:  idma64 mfd_core xhci_hcd [last unloaded: snd_pcm]
[  705.135529] kernel: CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.0.0-rc5-daily-default-20221011-0-gf048bf5646cc #dev
[  705.135546] kernel: Hardware name: AAEON UP-APL01/UP-APL01, BIOS UPA1AM40 08/06/2018
[  705.135558] kernel: RIP: 0010:dev_watchdog+0x1f5/0x210
[  705.135578] kernel: Code: ff e9 4b ff ff ff 48 89 ef c6 05 18 8a 20 01 01 e8 60 2d fb ff 44 89 e9 48 89 ee 48 c7 c7 28 bb ae a7 48 89 c2 e8 d4 29 2c 00 <0f> 0b e9 57 ff ff ff e8 df 23 4e ff e9 44 ff ff ff 66 2e 0f 1f 84
[  705.135594] kernel: RSP: 0018:ffff9ee740003e80 EFLAGS: 00010282
[  705.135615] kernel: RAX: 0000000000000000 RBX: ffff919984f0c4d0 RCX: 0000000000000000
[  705.135627] kernel: RDX: 0000000000000103 RSI: ffffffffa7a16269 RDI: 00000000ffffffff
[  705.135639] kernel: RBP: ffff919984f0c000 R08: 0000000000000000 R09: ffff9ee740003d20
[  705.135650] kernel: R10: 000000000008f0f0 R11: 000000000008f168 R12: ffff919984f0c3e0
[  705.135662] kernel: R13: 0000000000000000 R14: ffff919984f0c4d0 R15: ffffffffa6d20590
[  705.135674] kernel: FS:  0000000000000000(0000) GS:ffff9199f0800000(0000) knlGS:0000000000000000
[  705.135687] kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  705.135699] kernel: CR2: 0000000001953060 CR3: 0000000025c24000 CR4: 00000000003506f0
[  705.135711] kernel: Call Trace:
[  705.135724] kernel:  
[  705.135744] kernel:  ? mq_change_real_num_tx+0xd0/0xd0
[  705.135766] kernel:  call_timer_fn+0xad/0x250
[  705.135806] kernel:  run_timer_softirq+0x4a3/0x550
[  705.135833] kernel:  ? recalibrate_cpu_khz+0x10/0x10
[  705.135850] kernel:  ? ktime_get+0x77/0xe0
[  705.135869] kernel:  ? setup_local_APIC+0x340/0x340
[  705.135900] kernel:  __do_softirq+0x11e/0x33c
[  705.135936] kernel:  irq_exit_rcu+0x91/0xc0
[  705.135955] kernel:  sysvec_apic_timer_interrupt+0x8e/0xc0
[  705.135975] kernel:  
[  705.135985] kernel:  
[  705.135998] kernel:  asm_sysvec_apic_timer_interrupt+0x16/0x20
[  705.136015] kernel: RIP: 0010:cpuidle_enter_state+0x117/0x4b0
[  705.136032] kernel: Code: 3c 59 0f 84 4c 03 00 00 31 ff e8 54 24 67 ff 45 84 ff 74 12 9c 58 f6 c4 02 0f 85 7f 03 00 00 31 ff e8 0d b7 6d ff fb 45 85 f6 <0f> 88 d4 01 00 00 49 63 c6 4c 2b 2c 24 48 8d 14 40 48 8d 14 90 49
[  705.136046] kernel: RSP: 0018:ffffffffa7c03e30 EFLAGS: 00000206
[  705.136066] kernel: RAX: ffff9199f0800000 RBX: 0000000000000003 RCX: 000000000000001f
[  705.136077] kernel: RDX: 0000000000000000 RSI: ffffffffa7a16269 RDI: ffffffffa7a2298c
[  705.136088] kernel: RBP: ffff919981e2c800 R08: 000000a42d453c2b R09: 0000000000000002
[  705.136099] kernel: R10: 0000000000000001 R11: 0000000000000001 R12: ffffffffa7e60ee0
[  705.136110] kernel: R13: 000000a42d453c2b R14: 0000000000000003 R15: 0000000000000000
[  705.136172] kernel:  cpuidle_enter+0x29/0x40
[  705.136193] kernel:  do_idle+0x1e9/0x280
[  705.136228] kernel:  cpu_startup_entry+0x19/0x20
[  705.136246] kernel:  rest_init+0x128/0x130
[  705.136265] kernel:  arch_call_rest_init+0xa/0x14
[  705.136289] kernel:  start_kernel+0x683/0x6ac
[  705.136321] kernel:  secondary_startup_64_no_verify+0xe0/0xeb
[  705.136376] kernel:  
[  705.136387] kernel: ---[ end trace 0000000000000000 ]---

To reproduce TPLG=/lib/firmware/intel/sof-tplg/sof-apl-pcm512x.tplg MODEL=APL_UP2_PCM512X ~/sof-test/test-case/check-playback.sh -d 100 -l 1 -r 1

Reproduction Rate 1/50, rare case

2022-10-12 00:44:56 UTC [ERROR] Caught kernel log error
===========================>>
[  705.152491] kernel: r8169 0000:02:00.0 enp2s0: rtl_txcfg_empty_cond == 0 (loop: 42, delay: 100).
[  705.162860] kernel: r8169 0000:02:00.0 enp2s0: rtl_rxtx_empty_cond == 0 (loop: 42, delay: 100).
[  716.927143] kernel: r8169 0000:02:00.0 enp2s0: rtl_txcfg_empty_cond == 0 (loop: 42, delay: 100).
[  716.940532] kernel: r8169 0000:02:00.0 enp2s0: rtl_rxtx_empty_cond == 0 (loop: 42, delay: 100).
[  733.312271] kernel: r8169 0000:02:00.0 enp2s0: rtl_txcfg_empty_cond == 0 (loop: 42, delay: 100).
[  733.326199] kernel: r8169 0000:02:00.0 enp2s0: rtl_rxtx_empty_cond == 0 (loop: 42, delay: 100).
[  744.062380] kernel: r8169 0000:02:00.0 enp2s0: rtl_txcfg_empty_cond == 0 (loop: 42, delay: 100).
[  744.073789] kernel: r8169 0000:02:00.0 enp2s0: rtl_rxtx_empty_cond == 0 (loop: 42, delay: 100).
[  760.961879] kernel: r8169 0000:02:00.0 enp2s0: rtl_txcfg_empty_cond == 0 (loop: 42, delay: 100).
[  760.974064] kernel: r8169 0000:02:00.0 enp2s0: rtl_rxtx_empty_cond == 0 (loop: 42, delay: 100).
[  777.347083] kernel: r8169 0000:02:00.0 enp2s0: rtl_txcfg_empty_cond == 0 (loop: 42, delay: 100).
[  777.359609] kernel: r8169 0000:02:00.0 enp2s0: rtl_rxtx_empty_cond == 0 (loop: 42, delay: 100).
[  793.219988] kernel: r8169 0000:02:00.0 enp2s0: rtl_txcfg_empty_cond == 0 (loop: 42, delay: 100).
[  793.232147] kernel: r8169 0000:02:00.0 enp2s0: rtl_rxtx_empty_cond == 0 (loop: 42, delay: 100).
[  804.996403] kernel: r8169 0000:02:00.0 enp2s0: rtl_txcfg_empty_cond == 0 (loop: 42, delay: 100).
[  805.008387] kernel: r8169 0000:02:00.0 enp2s0: rtl_rxtx_empty_cond == 0 (loop: 42, delay: 100).
[  827.013746] kernel: r8169 0000:02:00.0 enp2s0: rtl_txcfg_empty_cond == 0 (loop: 42, delay: 100).
[  827.026510] kernel: r8169 0000:02:00.0 enp2s0: rtl_rxtx_empty_cond == 0 (loop: 42, delay: 100).
[  843.910273] kernel: r8169 0000:02:00.0 enp2s0: rtl_txcfg_empty_cond == 0 (loop: 42, delay: 100).
[  843.921952] kernel: r8169 0000:02:00.0 enp2s0: rtl_rxtx_empty_cond == 0 (loop: 42, delay: 100).
[  854.150517] kernel: r8169 0000:02:00.0 enp2s0: rtl_txcfg_empty_cond == 0 (loop: 42, delay: 100).
[  854.160797] kernel: r8169 0000:02:00.0 enp2s0: rtl_rxtx_empty_cond == 0 (loop: 42, delay: 100).
[  865.928269] kernel: r8169 0000:02:00.0 enp2s0: rtl_txcfg_empty_cond == 0 (loop: 42, delay: 100).
[  865.940803] kernel: r8169 0000:02:00.0 enp2s0: rtl_rxtx_empty_cond == 0 (loop: 42, delay: 100).
<<===========================

Attachments: dmesg-apl.txt etrace-apl.txt slogger-apl.txt

fredoh9 avatar Oct 12 '22 18:10 fredoh9

After scanning journalctl, the same watchdog RIP happened twice in a week.

...
-- Boot 1905054c837e4d4e998bb2879d9d6471 --
Oct 08 01:22:55 jf-apl-up2-pcm512x-1 kernel: NETDEV WATCHDOG: enp2s0 (r8169): transmit queue 0 timed out
-- Boot c2f9f3f244f74d779e7b2a908ee2e0f9 --
-- Boot a5c1df39b5184abdb263050946bd1e40 --
-- Boot cc16a276fdff43298d2bd6557941a72e --
-- Boot 492d759c683f43d4ac05f919a804928f --
-- Boot 25a4b38d70f44a56bb6076e3f37276d8 --
-- Boot e36bd3e2f8d946b7a6e55e95a9e69b1d --
-- Boot ab6435dcd30b4a53a70c0d86d9a89996 --
-- Boot 656e21b8310a46a1b9bfe22ce44cadd8 --
-- Boot bd4e457800cf444abd873204c6529284 --
-- Boot 82aff84c372747c3b048234204795c4a --
-- Boot 24e26d4984ac4778b6e0525c76991045 --
-- Boot ff2100291c504e1c978e17829cbaed60 --
-- Boot 0cfb312d63394ffb961d402ab9f8dc41 --
-- Boot 62d41f39bc6d4eb7bad32f70f7f8c111 --
Oct 12 00:42:15 jf-apl-up2-pcm512x-1 kernel: NETDEV WATCHDOG: enp2s0 (r8169): transmit queue 0 timed out
-- Boot f3a02322360b4996b0326b13ad7d08b0 --

fredoh9 avatar Oct 12 '22 18:10 fredoh9

no one has bandwidth to look into this, closing as "won't fix"

plbossart avatar Dec 13 '22 00:12 plbossart