sof
sof copied to clipboard
[BUG][SOF Zephyr][IPC4]Suspend/resume with capture fails with IO error on ADL/TGL
Describe the bug Suspend/resume test with capture fails with a IO error right after resuming.
To Reproduce SOF_LOGGING=none TPLG=/lib/firmware/intel/avs-tplg/sof-tgl-nocodec.tplg ./check-suspend-resume-with-audio.sh -m capture -l 1
Reproduction Rate 100% EDIT BY @XiaoyunWu6666 :in CI, it's almost 2/7 -- 3/7 , according to testresults in past week Expected behavior The stream should be able to resume with capture after resuming from suspend.
Impact showstopper
Environment
- Branch name and commit hash of the 2 repositories: sof (firmware/topology) and linux (kernel driver).
- Kernel: https://github.com/thesofproject/linux/commit/db34128c2a8d8456ddf6fd5ac18fda7e85976d62
- SOF: https://github.com/thesofproject/sof/commit/4aa202dd78e629d6d490207c1bab12f212b7ae7b
- Name of the topology file
- Topology: sof-tgl-nocodec.tplg (cavs-tgl-nocodec.tplg)
- Name of the platform(s) on which the bug is observed.
- Platform: TGL
Screenshots or console output arecord: pcm_read:2178: read error: Input/output error
@ranj063 , we did not observed this issue in CI, please share more log about this I/O error. There's only one IPC error happens on ADLP_RVP_NOCODEC_IPC4 platform(Iris will file a new bug to track this issue). Most of the suspend/resume with audio test cases can passed on other IPC4 platforms. Please refer to daily_IPC4 test reports for details.
BUG link for suspend resume IPC error on ADLP RVP NOCODEC IPC4 :https://github.com/thesofproject/sof/issues/5827
Suspend/resume with capture/playback fails with IO error and ipc time out on ADLP RVP NOCODEC happened in inner daily IPC4 12637 when check-suspend-resume-with-playback
ipc time out when create module and dsp init failed after that
[ 638.907336] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: widget copier.host.1.1 setup complete
[ 638.907343] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: Create widget pipeline.2 instance 0 - pipe 2 - core 0
[ 638.907347] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: pipeline: 2 memory pages: 2
[ 638.907353] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: ipc tx : 0x11020002|0x0: GLB_CREATE_PIPELINE
[ 639.402742] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: ipc timed out for 0x11020002|0x0
[ 639.402776] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: failed to create module pipeline.2
[ 639.402797] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: ipc tx : 0x12010000|0x0: GLB_DELETE_PIPELINE
[ 639.899549] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: ipc timed out for 0x12010000|0x0
[ 639.899554] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: failed to free pipeline widget pipeline.1
[ 639.899559] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: Failed to set up connected widgets
[ 639.899726] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: pcm: trigger stream 0 dir 0 cmd 1
[ 639.900286] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: FW Poll Status: reg[0x160]=0x14001e successful
[ 639.900288] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: ipc4 set pipeline 1 state 3
[ 639.900290] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: ipc tx : 0x13010003|0x0: GLB_SET_PIPELINE_STATE
[ 640.394911] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: ipc timed out for 0x13010003|0x0
[ 640.394944] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: failed to pause pipeline 1
hi @ranj063 , Did you see only IO error in the console without dmesg error when you found this issue ?
I checked past tests for TGLU_UP_HDA (sof Zephyr IPC4) and ADLP RVP NOCODEC(sof Zephyr IPC4) in inner CI. There are a couple of possible types of errors: I . only IO error in the console log : inner IPC4 result 12563 12593 II. fw report error specified resource not found when setting pipeline state 3 on ADL , no IO error : inner daily IPC4 12604 III. IO error with ipc time out when creating module on ADL and TGL ; TGL has fw error specified resource not found ; ADL failed to boot DSP firmware after resume after that : inner daily IPC4 12637
So probably #5827 is related to this bug (maybe duplicated , or just different presence of the error)
@kv2019i @ujfalusi does this need the Zephyr core PM patches you guys have been looking at ?
? I don't have time to debug Linux kernel.
@lgirdwood @RanderWang I'll try to reproduce this today. This shouldn't depend on the Zephyr PM patches.
I don't think the original case occurs anymore. I could not reproduce this on a local nocodec/tgl machine, despite multiple hours of test iterations. Looking at our daily tests, I don't see this occuring anymore. E.g. daily plan 12977 is clean. I'll lower to P2 to reflect this (certainly not 100% reproducion anymore and not seen in CI all the time). Most recent reproducion is with hda tplg with daily plan 12950. But interestingly, 1) there are no errors in dmesg, 2) test passes on nocodec tplg with same test run.
Please raise priority if we start seeing this with higher occurence.
please try this bug with the coming IPC4 rework
Closing this issue as it's not reproducible now.