sof icon indicating copy to clipboard operation
sof copied to clipboard

[BUG] arecord fails with non-zero exit status with check-pause-resume-playback-100

Open kv2019i opened this issue 1 year ago • 4 comments

Describe the bug Seen in CI with SOF main on Intel LNL: https://sof-ci.01.org/softestpr/PR1202/build472/devicetest/index.html?model=LNLM_RVP_HDA&testcase=check-pause-resume-capture-100

Bug description updated on 4th Sep, see comment https://github.com/thesofproject/sof/issues/9191#issuecomment-2328849343

Looks similar to (see an older Intel platforms):

  • https://github.com/thesofproject/sof/issues/8770
  • https://github.com/thesofproject/linux/issues/5109

kv2019i avatar Jun 04 '24 15:06 kv2019i

@abonislawski @serhiy-katsyuba-intel It seems just enabling the L1_EXIT is not sufficient, record failure still seen (see comment https://github.com/thesofproject/sof/pull/9194#issuecomment-2149037030 ).

The logs don't have much, but this very much seems like a case of host DMA not moving, and arecord exiting with error as no data is received.

kv2019i avatar Jun 05 '24 07:06 kv2019i

Is https://github.com/thesofproject/linux/issues/5048 a duplicate? Then maybe close this one because 5048 has a better looking description :-)

marc-hb avatar Jun 07 '24 16:06 marc-hb

Lets keep it open for the moment, just as a place holder on the FW side for v2.10

lgirdwood avatar Jun 10 '24 10:06 lgirdwood

Looking at this again. Narrowing the scope of this bug to specifically cover the failure with pause-resume case with a recent occurence in: https://sof-ci.01.org/linuxpr/PR4733/build4501/devicetest/index.html?model=LNLM_SDW_AIOC&testcase=check-pause-resume-playback-100

Failure fingerprint in sof-test console:

t=14110 ms: aplay: (29/100) Found   === PAUSE ===  ,  pausing for 166 ms
t=14110 ms: aplay: ERROR: aplay: do_pause:1567: pause push error: File descriptor in bad state

The errors happen when there are no clear errors in either kernel dmesg.txt or FW mtrace.txt. There does seem to be a timing anomaly seen in the logs, like here:

[  758.664736] kernel: snd_sof_intel_hda_common:hda_dai_trigger: soundwire_intel soundwire_intel.link.3: cmd=0 dai SDW3 Pin2 direction 0
[  759.996415] kernel: snd_sof_intel_hda_common:hda_dsp_stream_trigger: sof-audio-pci-intel-lnl 0000:00:1f.3: FW Poll Status: reg[0x1e0]=0x140000 successful
[  759.996434] kernel: snd_sof:sof_ipc4_trigger_pipelines: sof-audio-pci-intel-lnl 0000:00:1f.3: trigger cmd: 0 state: 2

This is clearly larger pause in activity than the ~0.1-0.2ms delays that are used in the test case.

kv2019i avatar Sep 04 '24 12:09 kv2019i

Not seen in PR/daily tests for a week, closing.

kv2019i avatar Dec 13 '24 11:12 kv2019i