[BUG] [stable-v2.2] overrun in pause/resume tests on HDA configs
Describe the bug
ERROR hda_dma_link_check_xrun(): overrun detected when testing capture in pause-resume tests.
~For some unknown reason, this is observed only on WHL, no other stable-v2.2 configuration seems affected.~
EDIT: same on ADL and ADL-N refresh.
It seems PCM-specific?
To Reproduce
TPLG=/lib/firmware/intel/sof-tplg/sof-hda-generic-4ch.tplg MODEL=WHL_UPEXT_HDA-ipc3 SOF_TEST_INTERVAL=5 ~/sof-test/test-case/check-pause-resume.sh -c 100 -m capture
or
TPLG=/lib/firmware/intel/sof-tplg/sof-hda-generic-4ch.tplg MODEL=WHL_UPEXT_HDA-ipc3 SOF_TEST_INTERVAL=5 ~/sof-test/test-case/multiple-pause-resume.sh -r 25T
Reproduction Rate
100%. Observed in daily tests runs:
41199?model=WHL_UPEXT_HDA-ipc3&testcase=check-pause-resume-capture-100 41199?model=WHL_UPEXT_HDA-ipc3&testcase=multiple-pause-resume-25
41235?model=WHL_UPEXT_HDA-ipc3&testcase=check-pause-resume-capture-100 41235?model=WHL_UPEXT_HDA-ipc3&testcase=multiple-pause-resume-25
This is very likely a very old issue, exposed recently only thanks to:
- https://github.com/thesofproject/sof-test/pull/1188
Expected behavior
No ERROR in the firmware logs.
Environment
Linux Branch: topic/sof-dev Linux Commit: 055cba4766af KConfig Branch: master KConfig Commit: 8fee06f8fd8a
SOF Build Result: 4027 SOF Branch: stable-v2.2 SOF Commit: 64416ba51354
Screenshots or console output
[ 124404.578390] ( 11.250000) c0 ll-schedule ./schedule/ll_schedule.c:517 INFO task params pri 0 flags 0 start 0 period 1000
[ 124421.713806] ( 17.135416) c0 ll-schedule ./schedule/ll_schedule.c:418 INFO new added task->start 1806829227 at 1806826150
[ 124433.224222] ( 11.510416) c0 ll-schedule ./schedule/ll_schedule.c:421 INFO num_tasks 3 total_num_tasks 3
[ 124626.609631] ( 193.385406) c0 eq-iir 2.11 ../audio/eq_iir/eq_iir.c:788 INFO eq_iir_trigger()
[ 124646.974214] ( 20.364582) c0 eq-iir 2.11 ../audio/eq_iir/eq_iir.c:788 INFO eq_iir_trigger()
[ 125877.755415] ( 1230.781250) c0 hda-dma ..../intel/hda/hda-dma.c:951 ERROR hda_dma_link_check_xrun(): overrun detected
[ 228458.636755] ( 102580.882812) c0 ll-schedule ./schedule/ll_schedule.c:186 INFO task 0xbe1c0100 pipe-task avg 1719, max 2082
[ 338651.496960] ( 110192.859375) c0 ipc src/ipc/ipc3/handler.c:1605 INFO ipc: new cmd 0x60060000
[ 338667.069876] ( 15.572916) c0 pipe 2.16 ....../pipeline-stream.c:270 INFO pipe trigger cmd 2
[ 339624.153171] ( 957.083313) c0 eq-iir 2.11 ../audio/eq_iir/eq_iir.c:788 INFO eq_iir_trigger()
[ 339645.351087] ( 21.197916) c0 ll-schedule ./schedule/ll_schedule.c:159 INFO task complete 0xbe1c0100 pipe-task
[ 339656.496920] ( 11.145833) c0 ll-schedule ./schedule/ll_schedule.c:162 INFO num_tasks 2 total_num_tasks 2
[ 339823.111497] ( 166.614578) c0 ipc src/ipc/ipc3/handler.c:1605 INFO ipc: new cmd 0x80010000
[ 339839.673996] ( 16.562500) c0 ipc src/ipc/ipc3/handler.c:595 INFO ipc: dai 3.1 -> config
[ 339851.913579] ( 12.239583) c0 dai src/lib/dai.c:166 INFO dai_get type 3 index 1 new sref 2
[ 339863.580245] ( 11.666666) c0 hda-dai 3.1 /drivers/intel/hda/hda.c:42 INFO hda_set_config(): channels 2 rate 48000
[ 339875.403161] ( 11.822916) c0 dai src/lib/dai.c:190 INFO dai_put type 3 index 1 new sref 1
[ 339887.226077] ( 11.822916) c0 ipc src/ipc/ipc3/dai.c:185 INFO ipc_comp_dai_config() dai type = 3 index = 1
[ 339900.090660] ( 12.864583) c0 dai 2.15 src/ipc/ipc3/dai.c:279 INFO dai_config() dai type = 3 index = 1 dd 0x9e0fe100
[ 459186.075504] ( 119285.984375) c0 ipc src/ipc/ipc3/handler.c:1605 INFO ipc: new cmd 0x60070000
[ 459201.648420] ( 15.572916) c0 pipe 2.16 ....../pipeline-stream.c:270 INFO pipe trigger cmd 8
[ 459221.544252] ( 19.895832) c0 ll-schedule ./schedule/ll_schedule.c:513 INFO task add 0xbe1c0100 pipe-task
[ 459233.106752] ( 11.562500) c0 ll-schedule ./schedule/ll_schedule.c:517 INFO task params pri 0 flags 0 start 0 period 1000
[ 459250.398418] ( 17.291666) c0 ll-schedule ./schedule/ll_schedule.c:418 INFO new added task->start 1813261227 at 1813254861
[ 459262.117167] ( 11.718750) c0 ll-schedule ./schedule/ll_schedule.c:421 INFO num_tasks 3 total_num_tasks 3
[ 459626.804653] ( 364.687500) c0 eq-iir 2.11 ../audio/eq_iir/eq_iir.c:788 INFO eq_iir_trigger()
[ 459647.012985] ( 20.208332) c0 eq-iir 2.11 ../audio/eq_iir/eq_iir.c:788 INFO eq_iir_trigger()
[ 460877.742103] ( 1230.729126) c0 hda-dma ..../intel/hda/hda-dma.c:951 ERROR hda_dma_link_check_xrun(): overrun detected
[ 672581.067024] ( 211703.328125) c0 ipc src/ipc/ipc3/handler.c:1605 INFO ipc: new cmd 0x60060000
[ 672597.056607] ( 15.989583) c0 pipe 2.16 ....../pipeline-stream.c:270 INFO pipe trigger cmd 2
[ 673374.556576] ( 777.499939) c0 eq-iir 2.11 ../audio/eq_iir/eq_iir.c:788 INFO eq_iir_trigger()
[ 673395.962825] ( 21.406250) c0 ll-schedule ./schedule/ll_schedule.c:159 INFO task complete 0xbe1c0100 pipe-task
[ 673407.056575] ( 11.093750) c0 ll-schedule ./schedule/ll_schedule.c:162 INFO num_tasks 2 total_num_tasks 2
[ 673711.587812] ( 304.531250) c0 ipc src/ipc/ipc3/handler.c:1605 INFO ipc: new cmd 0x80010000
[ 673728.410728] ( 16.822916) c0 ipc src/ipc/ipc3/handler.c:595 INFO ipc: dai 3.1 -> config
[ 673740.702395] ( 12.291666) c0 dai src/lib/dai.c:166 INFO dai_get type 3 index 1 new sref 2
[ 673752.629477] ( 11.927083) c0 hda-dai 3.1 /drivers/intel/hda/hda.c:42 INFO hda_set_config(): channels 2 rate 48000
cc:
- https://github.com/thesofproject/sof/issues/8761
See again in daily 41235 and 41277. 100% deterministic.
@kv2019i this issue also obaserved on other platform like ADLP and ADL-N refresh
With #9135, this is still one of the two failures hitting several of our models every day in our stable 2.2 IPC3 test plan. Like clockwork.
#9135 seems to affect only NOCODEC configurations. #9139 not just.
Two important observations thanks to @ranj063 's great help:
- Overruns in the single pipeline test seem to always and only affect HDA analog capture
- Overruns in the multiple pipeline test seem to always and only affect playback on HDMI2 and HDMI3 simultaneously.
Thanks to the new ADL-N refresh, we can now observe this consistently across two generations very far apart: WHL-HDA and ADL-N refresh HDA.
ERROR hda_dma_link_check_xrun(): overrun detectedwhen testing capture in pause-resume tests.
I think it makes sense that all we see these prints in the case of HDMI/HDA Analog PCM's. I dont think the overruns reported by the firmware trace are detrimental though. The test goes on to complete all requested 100 iteration of pause/release. The script does the right think by calling out these errors to report failures too. IMHO, i think we should reduce the log level for the prints to be warnings instead or error.
@ranj063 any idea why there is no overrun message when running the same test with Zephyr?
https://sof-ci.01.org/sofpr/PR9335/build6750/devicetest/index.html?model=TGLU_UP_HDA-ipc4&testcase=check-pause-resume-capture-100 I can only see the infamous "no bytes to copy" message there (#8761)
There are overruns messages (not ERRORs) but only when testing with multiple PCMs: https://sof-ci.01.org/sofpr/PR9335/build6750/devicetest/index.html?model=TGLU_UP_HDA-ipc4&testcase=multiple-pause-resume-50
I don't know yet whether these overruns happen with the same HDMI ports: is there an easy way to tell from the firmware logs?
This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.