[JSL] multiple-pause-resume fails with IPC timed out, etrace has many xrun
Found the problem while doing internal test with SOF FW v2.2. 15146?model=JSL_ES8336_HDA&testcase=multiple-pause-resume-50
100% reproducible with below test command in JSL_ES8336_HDA,
test command: TPLG=/lib/firmware/intel/sof-tplg/sof-jsl-es8336-ssp1.tplg MODEL=JSL_ES8336_HDA ~/sof-test/test-case/multiple-pause-resume.sh -r 50
- Kernel Branch: topic/sof-dev
- Kernel Commit: 91a177700173
- SOF Branch: stable-v2.2
- SOF Commit: afe466da0238
[ 2195.251633] kernel: sof-audio-pci-intel-icl 0000:00:1f.3: ipc tx error for 0x60040000 (msg/reply size: 12/12): -12
[ 2195.251636] kernel: sof-audio-pci-intel-icl 0000:00:1f.3: ASoC: error at soc_component_trigger on 0000:00:1f.3: -12
[ 2195.251638] kernel: HDMI 3: ASoC: trigger FE cmd: 1 failed: -12
[ 2195.759489] kernel: sof-audio-pci-intel-icl 0000:00:1f.3: ipc tx timed out for 0x60050000 (msg/reply size: 12/12)
[ 2195.759506] kernel: sof-audio-pci-intel-icl 0000:00:1f.3: ------------[ IPC dump start ]------------
[ 2195.759537] kernel: sof-audio-pci-intel-icl 0000:00:1f.3: hda irq intsts 0x00000000 intlctl 0xc0000181 rirb 00
[ 2195.759541] kernel: sof-audio-pci-intel-icl 0000:00:1f.3: dsp irq ppsts 0x00000000 adspis 0x00000000
[ 2195.759566] kernel: sof-audio-pci-intel-icl 0000:00:1f.3: error: host status 0x00000000 dsp status 0x00000000 mask 0x00000003
[ 2195.759570] kernel: sof-audio-pci-intel-icl 0000:00:1f.3: ------------[ IPC dump end ]------------
[ 2195.759573] kernel: sof-audio-pci-intel-icl 0000:00:1f.3: ------------[ DSP dump start ]------------
[ 2195.759575] kernel: sof-audio-pci-intel-icl 0000:00:1f.3: IPC timeout
[ 2195.759577] kernel: sof-audio-pci-intel-icl 0000:00:1f.3: fw_state: SOF_FW_BOOT_COMPLETE (6)
[ 2195.759600] kernel: sof-audio-pci-intel-icl 0000:00:1f.3: 0x00000005: module: ROM, state: FW_ENTERED, running
[ 2195.759956] kernel: sof-audio-pci-intel-icl 0000:00:1f.3: unexpected fault 0x00000000 trace 0x00004000
[ 2195.759960] kernel: sof-audio-pci-intel-icl 0000:00:1f.3: ------------[ DSP dump end ]------------
[ 2195.760025] kernel: sof-audio-pci-intel-icl 0000:00:1f.3: trace IO error
[ 2195.760040] kernel: sof-audio-pci-intel-icl 0000:00:1f.3: ASoC: error at soc_component_trigger on 0000:00:1f.3: -110
[ 2195.760047] kernel: HDMI 3: ASoC: trigger FE cmd: 0 failed: -110
[ 2196.263525] kernel: sof-audio-pci-intel-icl 0000:00:1f.3: ipc tx timed out for 0x60070000 (msg/reply size: 12/12)
[ 2196.263542] kernel: sof-audio-pci-intel-icl 0000:00:1f.3: IPC timeout
[ 2196.263551] kernel: sof-audio-pci-intel-icl 0000:00:1f.3: ASoC: error at soc_component_trigger on 0000:00:1f.3: -110
[ 2196.263562] kernel: ES8336: ASoC: trigger FE cmd: 4 failed: -110
[ 2196.767512] kernel: sof-audio-pci-intel-icl 0000:00:1f.3: ipc tx timed out for 0x60060000 (msg/reply size: 12/12)
[ 2196.767530] kernel: sof-audio-pci-intel-icl 0000:00:1f.3: IPC timeout
[ 2196.767607] kernel: sof-audio-pci-intel-icl 0000:00:1f.3: ASoC: error at soc_component_trigger on 0000:00:1f.3: -110
[ 2196.767616] kernel: ES8336: ASoC: trigger FE cmd: 3 failed: -110
[ 2197.271317] kernel: sof-audio-pci-intel-icl 0000:00:1f.3: ipc tx timed out for 0x60030000 (msg/reply size: 12/12)
[ 2197.271329] kernel: sof-audio-pci-intel-icl 0000:00:1f.3: IPC timeout
[ 2197.271402] kernel: sof-audio-pci-intel-icl 0000:00:1f.3: ASoC: error at snd_soc_pcm_component_hw_free on 0000:00:1f.3: -110
[ 2197.775212] kernel: sof-audio-pci-intel-icl 0000:00:1f.3: ipc tx timed out for 0x80010000 (msg/reply size: 216/12)
[ 2197.775231] kernel: sof-audio-pci-intel-icl 0000:00:1f.3: IPC timeout
[ 2197.775241] kernel: sof-audio-pci-intel-icl 0000:00:1f.3: Failed to set dai config for iDisp3
[ 2197.775247] kernel: sof-audio-pci-intel-icl 0000:00:1f.3: hda_ctrl_dai_widget_free: DAI config failed for widget 'HDA5.OUT'
[ 2207.311476] kernel: sof-audio-pci-intel-icl 0000:00:1f.3: ipc tx timed out for 0x60070000 (msg/reply size: 12/12)
[ 2207.311494] kernel: sof-audio-pci-intel-icl 0000:00:1f.3: IPC timeout
[ 2207.311505] kernel: sof-audio-pci-intel-icl 0000:00:1f.3: ASoC: error at soc_component_trigger on 0000:00:1f.3: -110
[ 2207.311517] kernel: ES8336: ASoC: trigger FE cmd: 4 failed: -110
jsl_multiple-pause-resume-50.sh.txt jsl_dmesg.txt jsl_etrace.txt jsl_logger.txt
@mengdonglin @keqiaozhang Maybe we can give a low priority to this issue as it has no impact on end user experience. Pause & Release by FW is not used by Linux or Chrome. So we're not likely to fix this.
Should this be moved to firmware? There is not much the kernel do if it does not get reply...
Actually, we have an error coming back from the firmware before the timeout:
[ 2195.250353] kernel: snd_sof:sof_pcm_trigger: sof-audio-pci-intel-icl 0000:00:1f.3: pcm: trigger stream 7 dir 0 cmd 1
[ 2195.250875] kernel: snd_sof_intel_hda_common:hda_dsp_stream_trigger: sof-audio-pci-intel-icl 0000:00:1f.3: FW Poll Status: reg[0x180]=0x2024001e successful
[ 2195.250882] kernel: snd_sof:ipc3_log_header: sof-audio-pci-intel-icl 0000:00:1f.3: ipc tx: 0x60040000: GLB_STREAM_MSG: TRIG_START
[ 2195.251633] kernel: sof-audio-pci-intel-icl 0000:00:1f.3: ipc tx error for 0x60040000 (msg/reply size: 12/12): -12
[ 2195.251636] kernel: sof-audio-pci-intel-icl 0000:00:1f.3: ASoC: error at soc_component_trigger on 0000:00:1f.3: -12
[ 2195.251638] kernel: HDMI 3: ASoC: trigger FE cmd: 1 failed: -12
After this, all IPC message will time out. This indicates that the firmware might have locked up or crashed.
This seems to be a complicated issue where the firmware ends-up with a memory allocation failure:
[ 35897863.990214] ( 527855.250000) c0 dma-trace src/lib/alloc.c:765 ERROR failed to alloc 0x18 bytes zone 0x1 caps 0x1 flags 0x0
[ 35898732.583929] ( 868.593689) c0 dma-trace src/lib/notifier.c:67 ERROR notifier_register(): callback handle allocation failed.
That doesn't seem to good, even if we have xruns in this case the firmware should not crap out. There's something fishy here.
Maybe we can give a low priority to this issue as it has no impact on end user experience. Pause & Release by FW is not used by Linux or Chrome. So we're not likely to fix this
@mengdonglin how do we know reproduction requires Pause & Release? Maybe Pause & Release only make this more likely and it could also happen without it, just less frequently. Until it's been investigated we can't tell can we?
cc:
- https://github.com/thesofproject/linux/issues/1339 (NOT removed)