[BUG] arecord fails with non-zero exit status with check-pause-resume-playback-100
Describe the bug Seen in CI with SOF main on Intel LNL: https://sof-ci.01.org/softestpr/PR1202/build472/devicetest/index.html?model=LNLM_RVP_HDA&testcase=check-pause-resume-capture-100
Bug description updated on 4th Sep, see comment https://github.com/thesofproject/sof/issues/9191#issuecomment-2328849343
Looks similar to (see an older Intel platforms):
- https://github.com/thesofproject/sof/issues/8770
- https://github.com/thesofproject/linux/issues/5109
@abonislawski @serhiy-katsyuba-intel It seems just enabling the L1_EXIT is not sufficient, record failure still seen (see comment https://github.com/thesofproject/sof/pull/9194#issuecomment-2149037030 ).
The logs don't have much, but this very much seems like a case of host DMA not moving, and arecord exiting with error as no data is received.
Is https://github.com/thesofproject/linux/issues/5048 a duplicate? Then maybe close this one because 5048 has a better looking description :-)
Lets keep it open for the moment, just as a place holder on the FW side for v2.10
Looking at this again. Narrowing the scope of this bug to specifically cover the failure with pause-resume case with a recent occurence in: https://sof-ci.01.org/linuxpr/PR4733/build4501/devicetest/index.html?model=LNLM_SDW_AIOC&testcase=check-pause-resume-playback-100
Failure fingerprint in sof-test console:
t=14110 ms: aplay: (29/100) Found === PAUSE === , pausing for 166 ms
t=14110 ms: aplay: ERROR: aplay: do_pause:1567: pause push error: File descriptor in bad state
The errors happen when there are no clear errors in either kernel dmesg.txt or FW mtrace.txt. There does seem to be a timing anomaly seen in the logs, like here:
[ 758.664736] kernel: snd_sof_intel_hda_common:hda_dai_trigger: soundwire_intel soundwire_intel.link.3: cmd=0 dai SDW3 Pin2 direction 0
[ 759.996415] kernel: snd_sof_intel_hda_common:hda_dsp_stream_trigger: sof-audio-pci-intel-lnl 0000:00:1f.3: FW Poll Status: reg[0x1e0]=0x140000 successful
[ 759.996434] kernel: snd_sof:sof_ipc4_trigger_pipelines: sof-audio-pci-intel-lnl 0000:00:1f.3: trigger cmd: 0 state: 2
This is clearly larger pause in activity than the ~0.1-0.2ms delays that are used in the test case.
Not seen in PR/daily tests for a week, closing.