[BUG] Zephyr "mtrace" logs empty until the next reboot
Describe the bug
Zephyr "mtrace" logs have become suddenly and silently empty until the next reboot.
This happened in daily test 40387
To Reproduce
Could not reproduce.
Reproduction Rate
~Seen only once for now.~ About 1/100 times
Expected behavior
Firmware logs are not totally empty.
Impact
No firmware logs
Environment
jf-mtlp-rvp-nocodec-5. Daily test 40387
Start Time: 2024-04-26 14:07:35 UTC End Time: 2024-04-26 17:23:17 UTC
Linux Branch: topic/sof-dev Linux Commit: f084b17039d5 KConfig Branch: master KConfig Commit: a63502374811
SOF Commit: 278ecc5b74ad Zephyr Commit: 24554363379d
Screenshots or console output
-rw-r--r-- 1 ubuntu ubuntu 100460 Apr 26 14:12 check-playback/2024-04-26-14:12:22-29836/mtrace.txt
-rw-r--r-- 1 ubuntu ubuntu 133472 Apr 26 14:13 check-capture/2024-04-26-14:13:08-752/mtrace.txt
-rw-r--r-- 1 ubuntu ubuntu 77080 Apr 26 14:20 check-playback/2024-04-26-14:14:03-32484/mtrace.txt
-rw-r--r-- 1 ubuntu ubuntu 146320 Apr 26 14:32 check-capture/2024-04-26-14:20:53-27761/mtrace.txt
-rw-r--r-- 1 ubuntu ubuntu 3222940 Apr 26 14:39 check-playback/2024-04-26-14:32:43-28751/mtrace.txt
-rw-r--r-- 1 ubuntu ubuntu 6676072 Apr 26 14:52 check-capture/2024-04-26-14:39:54-25713/mtrace.txt
-rw-r--r-- 1 ubuntu ubuntu 1616112 Apr 26 14:56 check-playback/2024-04-26-14:52:35-5326/mtrace.txt
-rw-r--r-- 1 ubuntu ubuntu 2332176 Apr 26 15:00 check-capture/2024-04-26-14:56:24-21523/mtrace.txt
-rw-r--r-- 1 ubuntu ubuntu 0 Apr 26 15:03 test-speaker/2024-04-26-15:03:00-2850/mtrace.txt
-rw-r--r-- 1 ubuntu ubuntu 0 Apr 26 15:04 check-signal-stop-start/2024-04-26-15:04:24-12962/mtrace.txt
-rw-r--r-- 1 ubuntu ubuntu 0 Apr 26 15:08 check-signal-stop-start/2024-04-26-15:08:12-31014/mtrace.txt
-rw-r--r-- 1 ubuntu ubuntu 0 Apr 26 15:14 check-xrun-injection/2024-04-26-15:14:40-12735/mtrace.txt
-rw-r--r-- 1 ubuntu ubuntu 0 Apr 26 15:15 check-xrun-injection/2024-04-26-15:15:24-17951/mtrace.txt
-rw-r--r-- 1 ubuntu ubuntu 0 Apr 26 15:16 multiple-pipeline/2024-04-26-15:16:28-8050/mtrace.txt
-rw-r--r-- 1 ubuntu ubuntu 0 Apr 26 15:22 multiple-pipeline/2024-04-26-15:22:42-26807/mtrace.txt
---- rebooted at 15:29 by CI ---
-rw-r--r-- 1 ubuntu ubuntu 1067236 Apr 26 15:36 simultaneous-playback-capture/2024-04-26-15:29:35-13597/mtrace.txt
-rw-r--r-- 1 ubuntu ubuntu 3504596 Apr 26 15:42 multiple-pipeline/2024-04-26-15:36:12-14454/mtrace.txt
-rw-r--r-- 1 ubuntu ubuntu 1944724 Apr 26 15:58 check-runtime-pm-status/2024-04-26-15:42:33-1210/mtrace.txt
-rw-r--r-- 1 ubuntu ubuntu 9792 Apr 26 16:04 volume-basic-test/2024-04-26-16:04:27-19663/mtrace.txt
-rw-r--r-- 1 ubuntu ubuntu 1146256 Apr 26 16:07 check-pause-resume/2024-04-26-16:04:53-26690/mtrace.txt
-rw-r--r-- 1 ubuntu ubuntu 3046340 Apr 26 16:11 check-pause-resume/2024-04-26-16:07:22-1621/mtrace.txt
-rw-r--r-- 1 ubuntu ubuntu 87337044 Apr 26 16:45 multiple-pause-resume/2024-04-26-16:11:25-15022/mtrace.txt
-rw-r--r-- 1 ubuntu ubuntu 32128 Apr 26 16:45 check-alsabat/2024-04-26-16:45:27-13220/mtrace.txt
-rw-r--r-- 1 ubuntu ubuntu 34624 Apr 26 16:46 check-alsabat/2024-04-26-16:46:11-31543/mtrace.txt
-rw-r--r-- 1 ubuntu ubuntu 33032 Apr 26 16:47 check-alsabat/2024-04-26-16:46:55-8567/mtrace.txt
-rw-r--r-- 1 ubuntu ubuntu 31700 Apr 26 16:47 check-alsabat/2024-04-26-16:47:39-29457/mtrace.txt
-rw-r--r-- 1 ubuntu ubuntu 34460 Apr 26 16:48 check-alsabat/2024-04-26-16:48:23-15795/mtrace.txt
-rw-r--r-- 1 ubuntu ubuntu 33128 Apr 26 16:49 check-alsabat/2024-04-26-16:49:07-4186/mtrace.txt
-rw-r--r-- 1 ubuntu ubuntu 31796 Apr 26 16:50 check-alsabat/2024-04-26-16:49:51-15326/mtrace.txt
-rw-r--r-- 1 ubuntu ubuntu 34552 Apr 26 16:50 check-alsabat/2024-04-26-16:50:35-3574/mtrace.txt
-rw-r--r-- 1 ubuntu ubuntu 33224 Apr 26 16:51 check-alsabat/2024-04-26-16:51:19-24718/mtrace.txt
-rw-r--r-- 1 ubuntu ubuntu 198832 Apr 26 17:08 check-suspend-resume-with-audio/2024-04-26-17:03:23-4793/mtrace.txt
-rw-r--r-- 1 ubuntu ubuntu 401016 Apr 26 17:16 check-suspend-resume-with-audio/2024-04-26-17:08:53-16938/mtrace.txt
-rw-r--r-- 1 ubuntu ubuntu 217600 Apr 26 18:35 test-speaker/2024-04-26-18:34:09-17183/mtrace.txt
-rw-r--r-- 1 ubuntu ubuntu 223 Apr 26 18:50 test-speaker/2024-04-26-18:50:17-29199/mtrace.txt
cc:
- https://github.com/thesofproject/sof-test/issues/1173
- https://github.com/thesofproject/linux/issues/4618
Last passing test:
declare -- cmd="journalctl_cmd --since=@1714143379"
2024-04-26 15:02:47 UTC [REMOTE_INFO] Starting func_exit_handler(0)
2024-04-26 15:02:49 UTC [REMOTE_INFO] pkill -TERM -f mtrace-reader.py
2024-04-26 15:02:49 UTC [REMOTE_INFO] nlines=26841 /home/ubuntu/sof-test/logs/check-capture/2024-04-26-14:56:24-21523/mtrace.txt
+ grep -B 2 -A 1 -i --word-regexp -e ERR -e ERROR -e '' /home/ubuntu/sof-test/logs/check-capture/2024-04-26-14:56:24-21523/mtrace.txt
2024-04-26 15:02:52 UTC [REMOTE_INFO] ktime=3219 sof-test PID=13540: ending
2024-04-26 15:02:52 UTC [REMOTE_INFO] Test Result: PASS!
The 7 tests that ran after this all had an empty mtrace.txt.
They did not seem to fail for any other reason, everything else seems fine.
I haven't found any error or anything suspicious in any log; journalctl included. Anything specific one should be looking for? Some particular IPC?
No other device in the same run experienced anything similar and I haven't seen anything like this ever before.
There was apparently no "loose" mtrace-reader.py because mtrace.txt would not be empty in that case, it would have the error message described in:
- https://github.com/thesofproject/sof-test/pull/1183
This does not look like https://github.com/thesofproject/sof-test/pull/1109 for the same reason: no error message?
After CI rebooted that device, mtrace seemed to work perfectly fine for the rest of the test run, see above.
Note the DSP "restores" from IMR on this system, it's not booting from scratch from D3.
Just happened again on jf-mtlp-rvp-sdw-6 in daily test 40524?model=MTLP_RVP_SDW&testcase=multiple-pipeline-capture-50
2024-05-01 14:51:26 UTC [REMOTE_INFO] Starting func_exit_handler(0)
2024-05-01 14:51:27 UTC [REMOTE_INFO] pkill -TERM -f mtrace-reader.py
2024-05-01 14:51:27 UTC [REMOTE_INFO] nlines=0 /home/ubuntu/sof-test/logs/multiple-pipeline/2024-05-01-14:45:23-4468/mtrace.txt
2024-05-01 14:51:27 UTC [REMOTE_ERROR] Empty logfile
+ grep -B 2 -A 1 -i --word-regexp -e ERR -e ERROR -e '' -e OSError /home/ubuntu/sof-test/logs/multiple-pipeline/2024-05-01-14:45:23-4468/mtrace.txt
2024-05-01 14:51:29 UTC [REMOTE_INFO] ktime=2776 sof-test PID=26273: ending
2024-05-01 14:51:29 UTC [REMOTE_INFO] Test Result: FAIL!
No other obvious error in the logs.
Final mtrace logs (from the previous test) copied below. Note the multiple "messages dropped" warnings.
Again this is a system with "IMR restore"
mtrace worked again after a complete Linux reboot.
[ 2043.205998] <inf> dma: dma_put: dma_put(), dma = 0x401028d0, sref = 0
[ 2044.570106] <inf> ipc: ipc_cmd: rx : 0x11000006|0x0
[ 2044.570228] <inf> pipe: pipeline_new: pipeline new pipe_id 0 priority 0
[ 2044.571693] <inf> ipc: ipc_cmd: rx : 0x40000004|0x306
[ 2044.572201] <inf> dai_intel_dmic: dai_dmic_probe: dmic_probe()
[ 2044.572361] <inf> dma: dma_get: dma_get() ID 0 sref = 1 busy channels 0
--- 1 messages dropped ---
[ 2044.572623] <inf> dai_intel_dmic: dai_dmic_set_config: dmic_set_config()
[ 2044.572685] <inf> dai_intel_dmic_nhlt: print_outcontrol: OUTCONTROL = 00390843
[ 2044.572731] <inf> dai_intel_dmic_nhlt: print_outcontrol: tie=0, sip=0, finit=0, fci=0
[ 2044.572778] <inf> dai_intel_dmic_nhlt: print_outcontrol: bfth=3, of=2, ipm=2, th=3
[ 2044.572826] <inf> dai_intel_dmic_nhlt: print_outcontrol: ipms1=0, ipms2=1, ipms3=0, ipms4=0
[ 2044.572873] <inf> dai_intel_dmic_nhlt: print_outcontrol: ipms_mode=1
[ 2044.572923] <inf> dai_intel_dmic_nhlt: dai_dmic_set_config_nhlt: OUTCONTROL0 = 02390843
[ 2044.572971] <inf> dai_intel_dmic_nhlt: print_outcontrol: OUTCONTROL = 00390843
[ 2044.573018] <inf> dai_intel_dmic_nhlt: print_outcontrol: tie=0, sip=0, finit=0, fci=0
--- 1 messages dropped ---
[ 2044.573110] <inf> dai_intel_dmic_nhlt: print_outcontrol: ipms1=0, ipms2=1, ipms3=0, ipms4=0
[ 2044.573156] <inf> dai_intel_dmic_nhlt: print_outcontrol: ipms_mode=1
[ 2044.573216] <inf> dai_intel_dmic_nhlt: dai_dmic_configure_coeff: fir_length_a = 101, fir_length_b = 247, packed = 0
--- 1 messages dropped ---
[ 2044.573443] <inf> dai_intel_dmic_nhlt: dai_nhlt_get_clock_div: pdm = 0, FIR_CONFIG = 0x00010064
[ 2044.573490] <inf> dai_intel_dmic_nhlt: dai_nhlt_get_clock_div: dai_index = 0, rate_div = 800, p_clkdiv = 16, p_mcic = 25, p_mfir = 2
--- 1 messages dropped ---
[ 2044.574648] <inf> ipc: ipc_cmd: rx : 0x40000010|0xa
--- 2 messages dropped ---
[ 2044.574865] <inf> eq_iir: eq_iir_init: comp:0 0x10 eq_iir_init()
[ 2044.577023] <inf> ipc: ipc_cmd: rx : 0x44000010|0x30000058
[ 2044.577143] <inf> eq_iir: eq_iir_set_config: comp:0 0x10 eq_iir_set_config()
[ 2044.578230] <inf> ipc: ipc_cmd: rx : 0x40010004|0x15
[ 2044.579475] <inf> ipc: ipc_cmd: rx : 0x11010004|0x0
[ 2044.579538] <inf> pipe: pipeline_new: pipeline new pipe_id 1 priority 0
[ 2044.581015] <inf> ipc: ipc_cmd: rx : 0x40000006|0x10010
[ 2044.582298] <inf> ipc: ipc_cmd: rx : 0x44000006|0x3c800034
[ 2044.583481] <inf> ipc: ipc_cmd: rx : 0x44000006|0x30000018
[ 2044.585060] <inf> ipc: ipc_cmd: rx : 0x40020004|0x10015
[ 2044.585426] <inf> dma: dma_get: dma_get() ID 0 sref = 1 busy channels 0
Observed again in daily test run 41268?model=MTLP_RVP_NOCODEC&testcase=test-speaker
and again in daily 42634?model=MTLP_RVP_SDW&testcase=check-alsabat-headset-playback-599
+ https://sof-ci.01.org/softestpr/PR931/build519/devicetest/index.html?model=MTLP_RVP_HDA&testcase=multiple-pause-resume-50
In this LNL example we lost BOTH mtrace and kernel logs: https://sof-ci.01.org/linuxpr/PR5110/build4324/devicetest/index.html?model=LNLM_RVP_HDA&testcase=check-sof-logger cc:
- https://github.com/thesofproject/linux/pull/5110#issuecomment-2276800444
No owner -> v2.12
No owner -> v2.13
No owner, moving to v2.14
This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.