[BUG][IPC4] ipc error 6 for msg GLB_SET_PIPELINE_STATE on TGLU_RVP_SDW_IPC4ZPH
Describe the bug This issue happens with different cases on TGLU_RVP_SDW_IPC4ZPH, it's easy to reproduce. Most of them failed with pause/resume or multiple pipelines tests. The IPC IDs are different, we have: ipc error for msg 0x13010003|0x0 when testing with multiple-pause-resume-50 ipc error for msg 0x13040003|0x0 when testing with multiple-pipeline-all-50 ipc error for msg 0x13040004|0x0 when testing with check-runtime-pm-status-15
You can refer to the inner test ID:http://sof-ci.sh.intel.com/#/result/planresultdetail/15002
error message:
[ 966.198000] kernel: snd_sof:sof_ipc4_set_pipeline_state: sof-audio-pci-intel-tgl 0000:00:1f.3: ipc4 set pipeline 1 state 3
[ 966.198001] kernel: snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-tgl 0000:00:1f.3: ipc tx : 0x13010003|0x0: GLB_SET_PIPELINE_STATE
[ 966.208616] kernel: snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-tgl 0000:00:1f.3: ipc tx reply: 0x33000006|0x0: GLB_SET_PIPELINE_STATE
[ 966.208620] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: FW reported error: 6 - Unknown error while processing the request
[ 966.208668] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: ipc error for msg 0x13010003|0x0
[ 966.208673] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: failed to pause pipeline 1
[ 966.208689] kernel: snd_sof_intel_hda_common:hda_dsp_stream_trigger: sof-audio-pci-intel-tgl 0000:00:1f.3: FW Poll Status: reg[0x160]=0x20140000 successful
[ 966.208705] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: ASoC: error at soc_component_trigger on 0000:00:1f.3: -22
[ 966.208708] kernel: Jack out: ASoC: trigger FE cmd: 3 failed: -22
To Reproduce TPLG=/lib/firmware/intel/avs-tplg/cavs-sdw.tplg MODEL=TGLU_RVP_SDW_IPC4ZPH ~/sof-test/test-case/multiple-pause-resume.sh -r 50 TPLG=/lib/firmware/intel/avs-tplg/cavs-sdw.tplg MODEL=TGLU_RVP_SDW_IPC4ZPH ~/sof-test/test-case/multiple-pipeline.sh -f p -c 20 -l 50 TPLG=/lib/firmware/intel/avs-tplg/cavs-sdw.tplg MODEL=TGLU_RVP_SDW_IPC4ZPH ~/sof-test/test-case/check-runtime-pm-status.sh -l 15
Reproduction Rate almost 100%

Environment
- Branch name and commit hash of the 2 repositories: sof (firmware/topology) and linux (kernel driver). Kernel Branch: topic/sof-dev e37ac35bc502 SOF Branch: main 42aec0a1867b Zephyr Commit: zephyr-v3.1.0-3042-g8e55e59c5917
- Name of the topology file
- Topology: cavs-sdw.tplg
- Name of the platform(s) on which the bug is observed.
- Platform: TGLU_RVP_SDW_IPC4ZPH
Should #6203, #6204 merge into this?
Should #6203, #6204 merge into this?
@kv2019i it seems like a lot of these timeouts are related with IPC4 ? Should we merge ?
With SOF main as of 29af2df7666b2fd3948523b6870b64cfd59e9459 , the reproduction rate I see locally has decrease a lot. I've today not had a single failure in my local setup. I can see the issue still popping in CI, but something has changed. I used to get the error triggered with <1h of test runs, every time.
Adding to v2.4 so we can track it and re-validate.
We observed a similar issue on TGLU_RVP_NOCODEC_IPC4ZPH when testing multiple-pipeline-playback-50.sh.
Inner test ID:http://sof-ci.sh.intel.com/#/result/planresultdetail/16066?model=TGLU_RVP_NOCODEC_IPC4ZPH&testcase=multiple-pipeline-playback-50
After more tests, this error is related to the recent kernel change: https://github.com/thesofproject/linux/commit/f048bf5646cca1157c2ac6fc1e24961eeb818621 But no such error with cavs IPC4 firmware, the kernel change seems to expose this bug in the sof zephyr-IPC4 firmware.
This issue did not occur today. Keep observing for a few days.