sof icon indicating copy to clipboard operation
sof copied to clipboard

[BUG] [stable-v2.2] overrun in pause/resume tests on HDA configs

Open marc-hb opened this issue 1 year ago • 6 comments

Describe the bug

ERROR hda_dma_link_check_xrun(): overrun detected when testing capture in pause-resume tests.

~For some unknown reason, this is observed only on WHL, no other stable-v2.2 configuration seems affected.~

EDIT: same on ADL and ADL-N refresh.

It seems PCM-specific?

To Reproduce

TPLG=/lib/firmware/intel/sof-tplg/sof-hda-generic-4ch.tplg MODEL=WHL_UPEXT_HDA-ipc3 SOF_TEST_INTERVAL=5 ~/sof-test/test-case/check-pause-resume.sh -c 100 -m capture

or

TPLG=/lib/firmware/intel/sof-tplg/sof-hda-generic-4ch.tplg MODEL=WHL_UPEXT_HDA-ipc3 SOF_TEST_INTERVAL=5 ~/sof-test/test-case/multiple-pause-resume.sh -r 25T

Reproduction Rate

100%. Observed in daily tests runs:

41199?model=WHL_UPEXT_HDA-ipc3&testcase=check-pause-resume-capture-100 41199?model=WHL_UPEXT_HDA-ipc3&testcase=multiple-pause-resume-25

41235?model=WHL_UPEXT_HDA-ipc3&testcase=check-pause-resume-capture-100 41235?model=WHL_UPEXT_HDA-ipc3&testcase=multiple-pause-resume-25

This is very likely a very old issue, exposed recently only thanks to:

  • https://github.com/thesofproject/sof-test/pull/1188

Expected behavior

No ERROR in the firmware logs.

Environment

Linux Branch: topic/sof-dev Linux Commit: 055cba4766af KConfig Branch: master KConfig Commit: 8fee06f8fd8a

SOF Build Result: 4027 SOF Branch: stable-v2.2 SOF Commit: 64416ba51354

Screenshots or console output

[      124404.578390] (          11.250000) c0 ll-schedule        ./schedule/ll_schedule.c:517  INFO task params pri 0 flags 0 start 0 period 1000
[      124421.713806] (          17.135416) c0 ll-schedule        ./schedule/ll_schedule.c:418  INFO new added task->start 1806829227 at 1806826150
[      124433.224222] (          11.510416) c0 ll-schedule        ./schedule/ll_schedule.c:421  INFO num_tasks 3 total_num_tasks 3
[      124626.609631] (         193.385406) c0 eq-iir       2.11  ../audio/eq_iir/eq_iir.c:788  INFO eq_iir_trigger()
[      124646.974214] (          20.364582) c0 eq-iir       2.11  ../audio/eq_iir/eq_iir.c:788  INFO eq_iir_trigger()
[      125877.755415] (        1230.781250) c0 hda-dma            ..../intel/hda/hda-dma.c:951  ERROR hda_dma_link_check_xrun(): overrun detected
[      228458.636755] (      102580.882812) c0 ll-schedule        ./schedule/ll_schedule.c:186  INFO task 0xbe1c0100 pipe-task  avg 1719, max 2082
[      338651.496960] (      110192.859375) c0 ipc                  src/ipc/ipc3/handler.c:1605 INFO ipc: new cmd 0x60060000
[      338667.069876] (          15.572916) c0 pipe         2.16  ....../pipeline-stream.c:270  INFO pipe trigger cmd 2
[      339624.153171] (         957.083313) c0 eq-iir       2.11  ../audio/eq_iir/eq_iir.c:788  INFO eq_iir_trigger()
[      339645.351087] (          21.197916) c0 ll-schedule        ./schedule/ll_schedule.c:159  INFO task complete 0xbe1c0100 pipe-task 
[      339656.496920] (          11.145833) c0 ll-schedule        ./schedule/ll_schedule.c:162  INFO num_tasks 2 total_num_tasks 2
[      339823.111497] (         166.614578) c0 ipc                  src/ipc/ipc3/handler.c:1605 INFO ipc: new cmd 0x80010000
[      339839.673996] (          16.562500) c0 ipc                  src/ipc/ipc3/handler.c:595  INFO ipc: dai 3.1 -> config 
[      339851.913579] (          12.239583) c0 dai                           src/lib/dai.c:166  INFO dai_get type 3 index 1 new sref 2
[      339863.580245] (          11.666666) c0 hda-dai      3.1   /drivers/intel/hda/hda.c:42   INFO hda_set_config(): channels 2 rate 48000
[      339875.403161] (          11.822916) c0 dai                           src/lib/dai.c:190  INFO dai_put type 3 index 1 new sref 1
[      339887.226077] (          11.822916) c0 ipc                      src/ipc/ipc3/dai.c:185  INFO ipc_comp_dai_config() dai type = 3 index = 1
[      339900.090660] (          12.864583) c0 dai          2.15        src/ipc/ipc3/dai.c:279  INFO dai_config() dai type = 3 index = 1 dd 0x9e0fe100
[      459186.075504] (      119285.984375) c0 ipc                  src/ipc/ipc3/handler.c:1605 INFO ipc: new cmd 0x60070000
[      459201.648420] (          15.572916) c0 pipe         2.16  ....../pipeline-stream.c:270  INFO pipe trigger cmd 8
[      459221.544252] (          19.895832) c0 ll-schedule        ./schedule/ll_schedule.c:513  INFO task add 0xbe1c0100 pipe-task 
[      459233.106752] (          11.562500) c0 ll-schedule        ./schedule/ll_schedule.c:517  INFO task params pri 0 flags 0 start 0 period 1000
[      459250.398418] (          17.291666) c0 ll-schedule        ./schedule/ll_schedule.c:418  INFO new added task->start 1813261227 at 1813254861
[      459262.117167] (          11.718750) c0 ll-schedule        ./schedule/ll_schedule.c:421  INFO num_tasks 3 total_num_tasks 3
[      459626.804653] (         364.687500) c0 eq-iir       2.11  ../audio/eq_iir/eq_iir.c:788  INFO eq_iir_trigger()
[      459647.012985] (          20.208332) c0 eq-iir       2.11  ../audio/eq_iir/eq_iir.c:788  INFO eq_iir_trigger()
[      460877.742103] (        1230.729126) c0 hda-dma            ..../intel/hda/hda-dma.c:951  ERROR hda_dma_link_check_xrun(): overrun detected
[      672581.067024] (      211703.328125) c0 ipc                  src/ipc/ipc3/handler.c:1605 INFO ipc: new cmd 0x60060000
[      672597.056607] (          15.989583) c0 pipe         2.16  ....../pipeline-stream.c:270  INFO pipe trigger cmd 2
[      673374.556576] (         777.499939) c0 eq-iir       2.11  ../audio/eq_iir/eq_iir.c:788  INFO eq_iir_trigger()
[      673395.962825] (          21.406250) c0 ll-schedule        ./schedule/ll_schedule.c:159  INFO task complete 0xbe1c0100 pipe-task 
[      673407.056575] (          11.093750) c0 ll-schedule        ./schedule/ll_schedule.c:162  INFO num_tasks 2 total_num_tasks 2
[      673711.587812] (         304.531250) c0 ipc                  src/ipc/ipc3/handler.c:1605 INFO ipc: new cmd 0x80010000
[      673728.410728] (          16.822916) c0 ipc                  src/ipc/ipc3/handler.c:595  INFO ipc: dai 3.1 -> config 
[      673740.702395] (          12.291666) c0 dai                           src/lib/dai.c:166  INFO dai_get type 3 index 1 new sref 2
[      673752.629477] (          11.927083) c0 hda-dai      3.1   /drivers/intel/hda/hda.c:42   INFO hda_set_config(): channels 2 rate 48000

cc:

  • https://github.com/thesofproject/sof/issues/8761

marc-hb avatar May 17 '24 22:05 marc-hb

See again in daily 41235 and 41277. 100% deterministic.

marc-hb avatar May 20 '24 18:05 marc-hb

@kv2019i this issue also obaserved on other platform like ADLP and ADL-N refresh

ssavati avatar Jul 08 '24 08:07 ssavati

With #9135, this is still one of the two failures hitting several of our models every day in our stable 2.2 IPC3 test plan. Like clockwork.

#9135 seems to affect only NOCODEC configurations. #9139 not just.

marc-hb avatar Jul 26 '24 20:07 marc-hb

Two important observations thanks to @ranj063 's great help:

  • Overruns in the single pipeline test seem to always and only affect HDA analog capture
  • Overruns in the multiple pipeline test seem to always and only affect playback on HDMI2 and HDMI3 simultaneously.

Thanks to the new ADL-N refresh, we can now observe this consistently across two generations very far apart: WHL-HDA and ADL-N refresh HDA.

marc-hb avatar Jul 26 '24 20:07 marc-hb

ERROR hda_dma_link_check_xrun(): overrun detected when testing capture in pause-resume tests.

I think it makes sense that all we see these prints in the case of HDMI/HDA Analog PCM's. I dont think the overruns reported by the firmware trace are detrimental though. The test goes on to complete all requested 100 iteration of pause/release. The script does the right think by calling out these errors to report failures too. IMHO, i think we should reduce the log level for the prints to be warnings instead or error.

ranj063 avatar Jul 26 '24 21:07 ranj063

@ranj063 any idea why there is no overrun message when running the same test with Zephyr?

https://sof-ci.01.org/sofpr/PR9335/build6750/devicetest/index.html?model=TGLU_UP_HDA-ipc4&testcase=check-pause-resume-capture-100 I can only see the infamous "no bytes to copy" message there (#8761)

There are overruns messages (not ERRORs) but only when testing with multiple PCMs: https://sof-ci.01.org/sofpr/PR9335/build6750/devicetest/index.html?model=TGLU_UP_HDA-ipc4&testcase=multiple-pause-resume-50

I don't know yet whether these overruns happen with the same HDMI ports: is there an easy way to tell from the firmware logs?

marc-hb avatar Jul 27 '24 00:07 marc-hb

This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.

github-actions[bot] avatar Jun 19 '25 03:06 github-actions[bot]