sof icon indicating copy to clipboard operation
sof copied to clipboard

[BUG][SOF Zephyr][IPC4]Suspend/resume with capture fails with IO error on ADL/TGL

Open ranj063 opened this issue 2 years ago • 9 comments

Describe the bug Suspend/resume test with capture fails with a IO error right after resuming.

To Reproduce SOF_LOGGING=none TPLG=/lib/firmware/intel/avs-tplg/sof-tgl-nocodec.tplg ./check-suspend-resume-with-audio.sh -m capture -l 1

Reproduction Rate 100% EDIT BY @XiaoyunWu6666 :in CI, it's almost 2/7 -- 3/7 , according to testresults in past week Expected behavior The stream should be able to resume with capture after resuming from suspend.

Impact showstopper

Environment

  1. Branch name and commit hash of the 2 repositories: sof (firmware/topology) and linux (kernel driver).
    • Kernel: https://github.com/thesofproject/linux/commit/db34128c2a8d8456ddf6fd5ac18fda7e85976d62
    • SOF: https://github.com/thesofproject/sof/commit/4aa202dd78e629d6d490207c1bab12f212b7ae7b
  2. Name of the topology file
    • Topology: sof-tgl-nocodec.tplg (cavs-tgl-nocodec.tplg)
  3. Name of the platform(s) on which the bug is observed.
    • Platform: TGL

Screenshots or console output arecord: pcm_read:2178: read error: Input/output error

ranj063 avatar May 13 '22 17:05 ranj063

@ranj063 , we did not observed this issue in CI, please share more log about this I/O error. There's only one IPC error happens on ADLP_RVP_NOCODEC_IPC4 platform(Iris will file a new bug to track this issue). Most of the suspend/resume with audio test cases can passed on other IPC4 platforms. Please refer to daily_IPC4 test reports for details.

keqiaozhang avatar May 16 '22 03:05 keqiaozhang

BUG link for suspend resume IPC error on ADLP RVP NOCODEC IPC4 :https://github.com/thesofproject/sof/issues/5827

XiaoyunWu6666 avatar May 16 '22 15:05 XiaoyunWu6666

Suspend/resume with capture/playback fails with IO error and ipc time out on ADLP RVP NOCODEC happened in inner daily IPC4 12637 when check-suspend-resume-with-playback

ipc time out when create module and dsp init failed after that

[  638.907336] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: widget copier.host.1.1 setup complete
[  638.907343] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: Create widget pipeline.2 instance 0 - pipe 2 - core 0
[  638.907347] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: pipeline: 2 memory pages: 2
[  638.907353] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: ipc tx      : 0x11020002|0x0: GLB_CREATE_PIPELINE
[  639.402742] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: ipc timed out for 0x11020002|0x0
[  639.402776] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: failed to create module pipeline.2
[  639.402797] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: ipc tx      : 0x12010000|0x0: GLB_DELETE_PIPELINE
[  639.899549] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: ipc timed out for 0x12010000|0x0
[  639.899554] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: failed to free pipeline widget pipeline.1
[  639.899559] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: Failed to set up connected widgets
[  639.899726] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: pcm: trigger stream 0 dir 0 cmd 1
[  639.900286] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: FW Poll Status: reg[0x160]=0x14001e successful
[  639.900288] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: ipc4 set pipeline 1 state 3
[  639.900290] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: ipc tx      : 0x13010003|0x0: GLB_SET_PIPELINE_STATE
[  640.394911] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: ipc timed out for 0x13010003|0x0
[  640.394944] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: failed to pause pipeline 1

IPC4-dmesg.txt

XiaoyunWu6666 avatar May 17 '22 02:05 XiaoyunWu6666

hi @ranj063 , Did you see only IO error in the console without dmesg error when you found this issue ?

I checked past tests for TGLU_UP_HDA (sof Zephyr IPC4) and ADLP RVP NOCODEC(sof Zephyr IPC4) in inner CI. There are a couple of possible types of errors: I . only IO error in the console log : inner IPC4 result 12563 12593 II. fw report error specified resource not found when setting pipeline state 3 on ADL , no IO error : inner daily IPC4 12604 III. IO error with ipc time out when creating module on ADL and TGL ; TGL has fw error specified resource not found ; ADL failed to boot DSP firmware after resume after that : inner daily IPC4 12637

So probably #5827 is related to this bug (maybe duplicated , or just different presence of the error)

XiaoyunWu6666 avatar May 17 '22 03:05 XiaoyunWu6666

@kv2019i @ujfalusi does this need the Zephyr core PM patches you guys have been looking at ?

lgirdwood avatar May 25 '22 13:05 lgirdwood

? I don't have time to debug Linux kernel.

RanderWang avatar May 30 '22 12:05 RanderWang

@lgirdwood @RanderWang I'll try to reproduce this today. This shouldn't depend on the Zephyr PM patches.

kv2019i avatar May 31 '22 07:05 kv2019i

I don't think the original case occurs anymore. I could not reproduce this on a local nocodec/tgl machine, despite multiple hours of test iterations. Looking at our daily tests, I don't see this occuring anymore. E.g. daily plan 12977 is clean. I'll lower to P2 to reflect this (certainly not 100% reproducion anymore and not seen in CI all the time). Most recent reproducion is with hda tplg with daily plan 12950. But interestingly, 1) there are no errors in dmesg, 2) test passes on nocodec tplg with same test run.

Please raise priority if we start seeing this with higher occurence.

kv2019i avatar May 31 '22 10:05 kv2019i

please try this bug with the coming IPC4 rework

RanderWang avatar Jun 02 '22 07:06 RanderWang

Closing this issue as it's not reproducible now.

keqiaozhang avatar Aug 29 '22 03:08 keqiaozhang