sof icon indicating copy to clipboard operation
sof copied to clipboard

[BUG] Zephyr "mtrace" logs empty until the next reboot

Open marc-hb opened this issue 1 year ago • 6 comments

Describe the bug

Zephyr "mtrace" logs have become suddenly and silently empty until the next reboot.

This happened in daily test 40387

To Reproduce

Could not reproduce.

Reproduction Rate

~Seen only once for now.~ About 1/100 times

Expected behavior

Firmware logs are not totally empty.

Impact

No firmware logs

Environment

jf-mtlp-rvp-nocodec-5. Daily test 40387

Start Time: 2024-04-26 14:07:35 UTC End Time: 2024-04-26 17:23:17 UTC

Linux Branch: topic/sof-dev Linux Commit: f084b17039d5 KConfig Branch: master KConfig Commit: a63502374811

SOF Commit: 278ecc5b74ad Zephyr Commit: 24554363379d

Screenshots or console output

-rw-r--r-- 1 ubuntu ubuntu   100460 Apr 26 14:12 check-playback/2024-04-26-14:12:22-29836/mtrace.txt
-rw-r--r-- 1 ubuntu ubuntu   133472 Apr 26 14:13 check-capture/2024-04-26-14:13:08-752/mtrace.txt
-rw-r--r-- 1 ubuntu ubuntu    77080 Apr 26 14:20 check-playback/2024-04-26-14:14:03-32484/mtrace.txt
-rw-r--r-- 1 ubuntu ubuntu   146320 Apr 26 14:32 check-capture/2024-04-26-14:20:53-27761/mtrace.txt
-rw-r--r-- 1 ubuntu ubuntu  3222940 Apr 26 14:39 check-playback/2024-04-26-14:32:43-28751/mtrace.txt
-rw-r--r-- 1 ubuntu ubuntu  6676072 Apr 26 14:52 check-capture/2024-04-26-14:39:54-25713/mtrace.txt
-rw-r--r-- 1 ubuntu ubuntu  1616112 Apr 26 14:56 check-playback/2024-04-26-14:52:35-5326/mtrace.txt
-rw-r--r-- 1 ubuntu ubuntu  2332176 Apr 26 15:00 check-capture/2024-04-26-14:56:24-21523/mtrace.txt
-rw-r--r-- 1 ubuntu ubuntu        0 Apr 26 15:03 test-speaker/2024-04-26-15:03:00-2850/mtrace.txt
-rw-r--r-- 1 ubuntu ubuntu        0 Apr 26 15:04 check-signal-stop-start/2024-04-26-15:04:24-12962/mtrace.txt
-rw-r--r-- 1 ubuntu ubuntu        0 Apr 26 15:08 check-signal-stop-start/2024-04-26-15:08:12-31014/mtrace.txt
-rw-r--r-- 1 ubuntu ubuntu        0 Apr 26 15:14 check-xrun-injection/2024-04-26-15:14:40-12735/mtrace.txt
-rw-r--r-- 1 ubuntu ubuntu        0 Apr 26 15:15 check-xrun-injection/2024-04-26-15:15:24-17951/mtrace.txt
-rw-r--r-- 1 ubuntu ubuntu        0 Apr 26 15:16 multiple-pipeline/2024-04-26-15:16:28-8050/mtrace.txt
-rw-r--r-- 1 ubuntu ubuntu        0 Apr 26 15:22 multiple-pipeline/2024-04-26-15:22:42-26807/mtrace.txt

   ---- rebooted at 15:29 by CI ---

-rw-r--r-- 1 ubuntu ubuntu  1067236 Apr 26 15:36 simultaneous-playback-capture/2024-04-26-15:29:35-13597/mtrace.txt
-rw-r--r-- 1 ubuntu ubuntu  3504596 Apr 26 15:42 multiple-pipeline/2024-04-26-15:36:12-14454/mtrace.txt
-rw-r--r-- 1 ubuntu ubuntu  1944724 Apr 26 15:58 check-runtime-pm-status/2024-04-26-15:42:33-1210/mtrace.txt
-rw-r--r-- 1 ubuntu ubuntu     9792 Apr 26 16:04 volume-basic-test/2024-04-26-16:04:27-19663/mtrace.txt
-rw-r--r-- 1 ubuntu ubuntu  1146256 Apr 26 16:07 check-pause-resume/2024-04-26-16:04:53-26690/mtrace.txt
-rw-r--r-- 1 ubuntu ubuntu  3046340 Apr 26 16:11 check-pause-resume/2024-04-26-16:07:22-1621/mtrace.txt
-rw-r--r-- 1 ubuntu ubuntu 87337044 Apr 26 16:45 multiple-pause-resume/2024-04-26-16:11:25-15022/mtrace.txt
-rw-r--r-- 1 ubuntu ubuntu    32128 Apr 26 16:45 check-alsabat/2024-04-26-16:45:27-13220/mtrace.txt
-rw-r--r-- 1 ubuntu ubuntu    34624 Apr 26 16:46 check-alsabat/2024-04-26-16:46:11-31543/mtrace.txt
-rw-r--r-- 1 ubuntu ubuntu    33032 Apr 26 16:47 check-alsabat/2024-04-26-16:46:55-8567/mtrace.txt
-rw-r--r-- 1 ubuntu ubuntu    31700 Apr 26 16:47 check-alsabat/2024-04-26-16:47:39-29457/mtrace.txt
-rw-r--r-- 1 ubuntu ubuntu    34460 Apr 26 16:48 check-alsabat/2024-04-26-16:48:23-15795/mtrace.txt
-rw-r--r-- 1 ubuntu ubuntu    33128 Apr 26 16:49 check-alsabat/2024-04-26-16:49:07-4186/mtrace.txt
-rw-r--r-- 1 ubuntu ubuntu    31796 Apr 26 16:50 check-alsabat/2024-04-26-16:49:51-15326/mtrace.txt
-rw-r--r-- 1 ubuntu ubuntu    34552 Apr 26 16:50 check-alsabat/2024-04-26-16:50:35-3574/mtrace.txt
-rw-r--r-- 1 ubuntu ubuntu    33224 Apr 26 16:51 check-alsabat/2024-04-26-16:51:19-24718/mtrace.txt
-rw-r--r-- 1 ubuntu ubuntu   198832 Apr 26 17:08 check-suspend-resume-with-audio/2024-04-26-17:03:23-4793/mtrace.txt
-rw-r--r-- 1 ubuntu ubuntu   401016 Apr 26 17:16 check-suspend-resume-with-audio/2024-04-26-17:08:53-16938/mtrace.txt
-rw-r--r-- 1 ubuntu ubuntu   217600 Apr 26 18:35 test-speaker/2024-04-26-18:34:09-17183/mtrace.txt
-rw-r--r-- 1 ubuntu ubuntu      223 Apr 26 18:50 test-speaker/2024-04-26-18:50:17-29199/mtrace.txt

cc:

  • https://github.com/thesofproject/sof-test/issues/1173
  • https://github.com/thesofproject/linux/issues/4618

marc-hb avatar Apr 26 '24 21:04 marc-hb

Last passing test:

declare -- cmd="journalctl_cmd --since=@1714143379"
2024-04-26 15:02:47 UTC [REMOTE_INFO] Starting func_exit_handler(0)
2024-04-26 15:02:49 UTC [REMOTE_INFO] pkill -TERM -f mtrace-reader.py
2024-04-26 15:02:49 UTC [REMOTE_INFO] nlines=26841 /home/ubuntu/sof-test/logs/check-capture/2024-04-26-14:56:24-21523/mtrace.txt
+ grep -B 2 -A 1 -i --word-regexp -e ERR -e ERROR -e '' /home/ubuntu/sof-test/logs/check-capture/2024-04-26-14:56:24-21523/mtrace.txt
2024-04-26 15:02:52 UTC [REMOTE_INFO] ktime=3219 sof-test PID=13540: ending
2024-04-26 15:02:52 UTC [REMOTE_INFO] Test Result: PASS!

The 7 tests that ran after this all had an empty mtrace.txt.

They did not seem to fail for any other reason, everything else seems fine.

I haven't found any error or anything suspicious in any log; journalctl included. Anything specific one should be looking for? Some particular IPC?

No other device in the same run experienced anything similar and I haven't seen anything like this ever before.

There was apparently no "loose" mtrace-reader.py because mtrace.txt would not be empty in that case, it would have the error message described in:

  • https://github.com/thesofproject/sof-test/pull/1183

This does not look like https://github.com/thesofproject/sof-test/pull/1109 for the same reason: no error message?

After CI rebooted that device, mtrace seemed to work perfectly fine for the rest of the test run, see above.

Note the DSP "restores" from IMR on this system, it's not booting from scratch from D3.

marc-hb avatar Apr 26 '24 21:04 marc-hb

Just happened again on jf-mtlp-rvp-sdw-6 in daily test 40524?model=MTLP_RVP_SDW&testcase=multiple-pipeline-capture-50

2024-05-01 14:51:26 UTC [REMOTE_INFO] Starting func_exit_handler(0)
2024-05-01 14:51:27 UTC [REMOTE_INFO] pkill -TERM -f mtrace-reader.py
2024-05-01 14:51:27 UTC [REMOTE_INFO] nlines=0 /home/ubuntu/sof-test/logs/multiple-pipeline/2024-05-01-14:45:23-4468/mtrace.txt
2024-05-01 14:51:27 UTC [REMOTE_ERROR] Empty logfile
+ grep -B 2 -A 1 -i --word-regexp -e ERR -e ERROR -e '' -e OSError /home/ubuntu/sof-test/logs/multiple-pipeline/2024-05-01-14:45:23-4468/mtrace.txt
2024-05-01 14:51:29 UTC [REMOTE_INFO] ktime=2776 sof-test PID=26273: ending
2024-05-01 14:51:29 UTC [REMOTE_INFO] Test Result: FAIL!

No other obvious error in the logs.

Final mtrace logs (from the previous test) copied below. Note the multiple "messages dropped" warnings.

Again this is a system with "IMR restore"

mtrace worked again after a complete Linux reboot.

[ 2043.205998] <inf> dma: dma_put: dma_put(), dma = 0x401028d0, sref = 0
[ 2044.570106] <inf> ipc: ipc_cmd: rx	: 0x11000006|0x0
[ 2044.570228] <inf> pipe: pipeline_new: pipeline new pipe_id 0 priority 0
[ 2044.571693] <inf> ipc: ipc_cmd: rx	: 0x40000004|0x306
[ 2044.572201] <inf> dai_intel_dmic: dai_dmic_probe: dmic_probe()
[ 2044.572361] <inf> dma: dma_get: dma_get() ID 0 sref = 1 busy channels 0
--- 1 messages dropped ---
[ 2044.572623] <inf> dai_intel_dmic: dai_dmic_set_config: dmic_set_config()
[ 2044.572685] <inf> dai_intel_dmic_nhlt: print_outcontrol: OUTCONTROL = 00390843
[ 2044.572731] <inf> dai_intel_dmic_nhlt: print_outcontrol:   tie=0, sip=0, finit=0, fci=0
[ 2044.572778] <inf> dai_intel_dmic_nhlt: print_outcontrol:   bfth=3, of=2, ipm=2, th=3
[ 2044.572826] <inf> dai_intel_dmic_nhlt: print_outcontrol:   ipms1=0, ipms2=1, ipms3=0, ipms4=0
[ 2044.572873] <inf> dai_intel_dmic_nhlt: print_outcontrol:   ipms_mode=1
[ 2044.572923] <inf> dai_intel_dmic_nhlt: dai_dmic_set_config_nhlt: OUTCONTROL0 = 02390843
[ 2044.572971] <inf> dai_intel_dmic_nhlt: print_outcontrol: OUTCONTROL = 00390843
[ 2044.573018] <inf> dai_intel_dmic_nhlt: print_outcontrol:   tie=0, sip=0, finit=0, fci=0
--- 1 messages dropped ---
[ 2044.573110] <inf> dai_intel_dmic_nhlt: print_outcontrol:   ipms1=0, ipms2=1, ipms3=0, ipms4=0
[ 2044.573156] <inf> dai_intel_dmic_nhlt: print_outcontrol:   ipms_mode=1
[ 2044.573216] <inf> dai_intel_dmic_nhlt: dai_dmic_configure_coeff: fir_length_a = 101, fir_length_b = 247, packed = 0
--- 1 messages dropped ---
[ 2044.573443] <inf> dai_intel_dmic_nhlt: dai_nhlt_get_clock_div: pdm = 0, FIR_CONFIG = 0x00010064
[ 2044.573490] <inf> dai_intel_dmic_nhlt: dai_nhlt_get_clock_div: dai_index = 0, rate_div = 800, p_clkdiv = 16, p_mcic = 25, p_mfir = 2
--- 1 messages dropped ---
[ 2044.574648] <inf> ipc: ipc_cmd: rx	: 0x40000010|0xa
--- 2 messages dropped ---
[ 2044.574865] <inf> eq_iir: eq_iir_init: comp:0 0x10 eq_iir_init()
[ 2044.577023] <inf> ipc: ipc_cmd: rx	: 0x44000010|0x30000058
[ 2044.577143] <inf> eq_iir: eq_iir_set_config: comp:0 0x10 eq_iir_set_config()
[ 2044.578230] <inf> ipc: ipc_cmd: rx	: 0x40010004|0x15
[ 2044.579475] <inf> ipc: ipc_cmd: rx	: 0x11010004|0x0
[ 2044.579538] <inf> pipe: pipeline_new: pipeline new pipe_id 1 priority 0
[ 2044.581015] <inf> ipc: ipc_cmd: rx	: 0x40000006|0x10010
[ 2044.582298] <inf> ipc: ipc_cmd: rx	: 0x44000006|0x3c800034
[ 2044.583481] <inf> ipc: ipc_cmd: rx	: 0x44000006|0x30000018
[ 2044.585060] <inf> ipc: ipc_cmd: rx	: 0x40020004|0x10015
[ 2044.585426] <inf> dma: dma_get: dma_get() ID 0 sref = 1 busy channels 0

marc-hb avatar May 01 '24 21:05 marc-hb

Observed again in daily test run 41268?model=MTLP_RVP_NOCODEC&testcase=test-speaker

marc-hb avatar May 20 '24 18:05 marc-hb

and again in daily 42634?model=MTLP_RVP_SDW&testcase=check-alsabat-headset-playback-599

+ https://sof-ci.01.org/softestpr/PR931/build519/devicetest/index.html?model=MTLP_RVP_HDA&testcase=multiple-pause-resume-50

marc-hb avatar Jun 17 '24 20:06 marc-hb

In this LNL example we lost BOTH mtrace and kernel logs: https://sof-ci.01.org/linuxpr/PR5110/build4324/devicetest/index.html?model=LNLM_RVP_HDA&testcase=check-sof-logger cc:

  • https://github.com/thesofproject/linux/pull/5110#issuecomment-2276800444

marc-hb avatar Aug 08 '24 23:08 marc-hb

No owner -> v2.12

kv2019i avatar Sep 05 '24 12:09 kv2019i

No owner -> v2.13

kv2019i avatar Jan 03 '25 10:01 kv2019i

No owner, moving to v2.14

kv2019i avatar Apr 23 '25 13:04 kv2019i

This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.

github-actions[bot] avatar Jul 24 '25 09:07 github-actions[bot]