[BUG] [stable-v2.2] [NOCODEC] ERROR pipeline_comp_reset(): failed to recover (multiple-pipeline-all)
Describe the bug
On ADLP_RVP_NOCODEC-ipc3 and APL_UP2_NOCODEC-ipc3, using all pipeline concurrently prints this error repeatedly.
EDIT: some error in some TWL configurations.
This error is likely old, discovered only now thanks to:
- https://github.com/thesofproject/sof-test/pull/1188
[ 6038167.207981] ( 8.229166) c0 pipe 11.21 ......./pipeline-graph.c:357 INFO pipe reset
[ 6038211.791313] ( 44.583332) c0 pipe 5.3 ......./pipeline-graph.c:333 WARN pipeline_comp_reset(): component is in active state, try to stop it
[ 6038221.739229] ( 9.947916) c0 pipe 5.3 ......./pipeline-graph.c:336 ERROR pipeline_comp_reset(): failed to recover
[ 6038634.499630] ( 412.760406) c0 ipc src/ipc/ipc3/handler.c:1605 INFO ipc: new cmd 0x60050000
--
[ 14446145.207212] ( 8.020833) c0 pipe 9.16 ......./pipeline-graph.c:357 INFO pipe reset
[ 14446189.582210] ( 44.375000) c0 pipe 5.3 ......./pipeline-graph.c:333 WARN pipeline_comp_reset(): component is in active state, try to stop it
[ 14446199.686376] ( 10.104166) c0 pipe 5.3 ......./pipeline-graph.c:336 ERROR pipeline_comp_reset(): failed to recover
[ 14446643.175942] ( 443.489563) c0 ipc src/ipc/ipc3/handler.c:1605 INFO ipc: new cmd 0x60050000
--
[ 22854418.831430] ( 8.125000) c0 pipe 11.21 ......./pipeline-graph.c:357 INFO pipe reset
[ 22854462.946012] ( 44.114582) c0 pipe 5.3 ......./pipeline-graph.c:333 WARN pipeline_comp_reset(): component is in active state, try to stop it
[ 22854472.998095] ( 10.052083) c0 pipe 5.3 ......./pipeline-graph.c:336 ERROR pipeline_comp_reset(): failed to recover
[ 22854881.070995] ( 408.072906) c0 ipc src/ipc/ipc3/handler.c:1605 INFO ipc: new cmd 0x60050000
Example from daily test run: 41199?model=APL_UP2_NOCODEC-ipc3&testcase=multiple-pipeline-all-50
Same error message in: 41199?model=APL_UP2_NOCODEC-ipc3&testcase=multiple-pause-resume-25 41199?model=ADLP_RVP_NOCODEC-ipc3&testcase=multiple-pipeline-playback-50 41199?model=ADLP_RVP_NOCODEC-ipc3&testcase=multiple-pause-resume-25 41199?model=ADLP_RVP_NOCODEC-ipc3&testcase=multiple-pipeline-all-50
To Reproduce
TPLG=/lib/firmware/intel/sof-tplg/sof-adl-nocodec.tplg MODEL=ADLP_RVP_NOCODEC-ipc3 SOF_TEST_INTERVAL=5 ~/sof-test/test-case/multiple-pipeline.sh -f p -l 50
Reproduction Rate
100%?
Environment
Start Time: 2024-05-16 16:26:28 UTC End Time: 2024-05-16 19:23:14 UTC Linux Build Result: 1847 Linux Branch: topic/sof-dev Linux Commit: 055cba4766af KConfig Branch: master KConfig Commit: 8fee06f8fd8a
SOF Branch: stable-v2.2 SOF Commit: 64416ba51354
Screenshots or console output
[ 6309919.280516] ( 5.989583) c0 memory src/lib/alloc.c:578 INFO 128 Bytes blocks ID:1 base 0xbe059000
[ 6309925.165933] ( 5.885417) c0 memory src/lib/alloc.c:582 INFO Number of Blocks: total 96sed 14 free 82
[ 6309931.155516] ( 5.989583) c0 memory src/lib/alloc.c:578 INFO 256 Bytes blocks ID:2 base 0xbe05c000
[ 6309936.988849] ( 5.833333) c0 memory src/lib/alloc.c:582 INFO Number of Blocks: total 112sed 23 free 89
[ 6309942.978432] ( 5.989583) c0 memory src/lib/alloc.c:578 INFO 512 Bytes blocks ID:3 base 0xbe063000
[ 6309948.811765] ( 5.833333) c0 memory src/lib/alloc.c:582 INFO Number of Blocks: total 12sed 4 free 8
[ 6355311.570379] ( 45362.757812) c0 dma-trace src/trace/trace.c:141 INFO Suppressed 3 similar messages: %d Bytes blocks ID:%d base 0x%x
[ 6355317.143296] ( 5.572917) c0 dma-trace src/trace/trace.c:141 INFO Suppressed 3 similar messages: Number of Blocks: total % used %d free %d
[ 6355321.830796] ( 4.687500) c0 ipc src/ipc/ipc3/handler.c:1605 INFO ipc: new cmd 0x60050000
[ 6355329.434962] ( 7.604167) c0 pipe 9.16 ....../pipeline-stream.c:270 INFO pipe trigger cmd 0
[ 6356034.903684] ( 705.468750) c0 ll-schedule ./schedule/ll_schedule.c:159 INFO task complete 0xbe059380 pipe-task
[ 6356041.101600] ( 6.197917) c0 ll-schedule ./schedule/ll_schedule.c:162 INFO num_tasks 4 total_num_tasks 4
[ 6356063.132849] ( 22.031250) c0 host 11.17 ....../pipeline-stream.c:196 INFO perf comp_copy peak plat 226 cpu 224
[ 6356284.799507] ( 221.666656) c0 ipc src/ipc/ipc3/handler.c:1605 INFO ipc: new cmd 0x60030000
[ 6356292.716174] ( 7.916667) c0 pipe 9.16 ......./pipeline-graph.c:357 INFO pipe reset
[ 6356336.309922] ( 43.593750) c0 pipe 5.3 ......./pipeline-graph.c:333 WARN pipeline_comp_reset(): component is in active state, try to stop it
[ 6356345.997422] ( 9.687500) c0 pipe 5.3 ......./pipeline-graph.c:336 ERROR pipeline_comp_reset(): failed to recover
[ 6356610.893244] ( 264.895813) c0 ipc src/ipc/ipc3/handler.c:1605 INFO ipc: new cmd 0x60050000
[ 6356619.486994] ( 8.593750) c0 pipe 11.21 ....../pipeline-stream.c:270 INFO pipe trigger cmd 0
[ 6357031.570311] ( 412.083313) c0 ssp-dai 1.5 /drivers/intel/ssp/ssp.c:1140 INFO ssp_trigger() cmd 0
[ 6357063.653643] ( 32.083332) c0 ssp-dai 1.5 /drivers/intel/ssp/ssp.c:1108 INFO ssp_stop(), TX stop
[ 6357071.205726] ( 7.552083) c0 dw-dma src/drivers/dw/dma.c:413 INFO dw_dma_stop(): dma 0 channel 0 stop
[ 6357085.111975] ( 13.906249) c0 ll-schedule ./schedule/ll_schedule.c:159 INFO task complete 0xbe059680 pipe-task
[ 6357090.736975] ( 5.625000) c0 ll-schedule ./schedule/ll_schedule.c:162 INFO num_tasks 3 total_num_tasks 3
[ 6357099.695308] ( 8.958333) c0 ll-schedule ./schedule/ll_schedule.c:159 INFO task complete 0xbe059400 pipe-task
[ 6357105.372391] ( 5.677083) c0 ll-schedule ./schedule/ll_schedule.c:162 INFO num_tasks 2 total_num_tasks 2
[ 6357324.591133] ( 219.218735) c0 ipc src/ipc/ipc3/handler.c:1605 INFO ipc: new cmd 0x60030000
[ 6357332.247382] ( 7.656250) c0 pipe 11.21 ......./pipeline-graph.c:357 INFO pipe reset
[ 6357380.684880] ( 48.437496) c0 dai 5.2 src/audio/dai.c:671 INFO dai_reset()
[ 6357391.257797] ( 10.572916) c0 dw-dma src/drivers/dw/dma.c:260 INFO dw_dma_channel_put(): dma 0 channel 0 put
[ 6357729.799450] ( 338.541656) c0 ipc src/ipc/ipc3/handler.c:1605 INFO ipc: new cmd 0x30020000
Exact same multi pipeline tests failing exactly the same in the same configuration in the daily test run of the next day (test run 41235)
EDIT: and again in 41277. 100% deterministic.
@kv2019i in our daily CI , this issue still reproducible on stable-v2.2 (IPC3) branch but not see on main branch (IPC4). is this PR https://github.com/thesofproject/sof/pull/9136 need to land stable-v2.2 also ?
SOF error logs
2024-07-07 18:26:51 UTC [REMOTE_INFO] nlines=80166 /home/ubuntu/sof-test/logs/multiple-pipeline/2024-07-07-18:21:01-27918/slogger.txt
+ grep -B 2 -A 1 -i --word-regexp -e ERR -e ERROR -e '' -e OSError /home/ubuntu/sof-test/logs/multiple-pipeline/2024-07-07-18:21:01-27918/slogger.txt
[ 11247799.292636] ( 22.239582) c2 pipe 7.40 ......./pipeline-graph.c:357 INFO pipe reset
[ 11247895.750965] ( 96.458328) c2 pipe 1.3 ......./pipeline-graph.c:333 WARN pipeline_comp_reset(): component is in active state, try to stop it
[ 11247921.375964] ( 25.624998) c2 pipe 1.3 ......./pipeline-graph.c:336 ERROR pipeline_comp_reset(): failed to recover
[ 11248349.240531] ( 427.864563) c0 ipc src/ipc/ipc3/handler.c:1605 INFO ipc: new cmd 0x60050000
--
[ 66761752.555460] ( 22.239582) c2 pipe 7.40 ......./pipeline-graph.c:357 INFO pipe reset
[ 66761848.961706] ( 96.406242) c2 pipe 1.3 ......./pipeline-graph.c:333 WARN pipeline_comp_reset(): component is in active state, try to stop it
[ 66761874.638788] ( 25.677082) c2 pipe 1.3 ......./pipeline-graph.c:336 ERROR pipeline_comp_reset(): failed to recover
[ 66762136.513778] ( 261.875000) c0 ipc src/ipc/ipc3/handler.c:1605 INFO ipc: new cmd 0x60050000
--
[ 80619819.504790] ( 22.239582) c2 pipe 7.40 ......./pipeline-graph.c:357 INFO pipe reset
[ 80619916.744369] ( 97.239578) c2 pipe 1.3 ......./pipeline-graph.c:333 WARN pipeline_comp_reset(): component is in active state, try to stop it
[ 80619942.421451] ( 25.677082) c2 pipe 1.3 ......./pipeline-graph.c:336 ERROR pipeline_comp_reset(): failed to recover
[ 80620367.525601] ( 425.104156) c0 ipc src/ipc/ipc3/handler.c:1605 INFO ipc: new cmd 0x60050000
--
[ 108391922.724142] ( 21.927082) c2 pipe 7.40 ......./pipeline-graph.c:357 INFO pipe reset
[ 108392019.494971] ( 96.770828) c2 pipe 1.3 ......./pipeline-graph.c:333 WARN pipeline_comp_reset(): component is in active state, try to stop it
[ 108392045.328304] ( 25.833332) c2 pipe 1.3 ......./pipeline-graph.c:336 ERROR pipeline_comp_reset(): failed to recover
[ 108392541.109534] ( 495.781219) c0 ipc src/ipc/ipc3/handler.c:1605 INFO ipc: new cmd 0x60050000
--
[ 275067793.757288] ( 22.083332) c2 pipe 7.40 ......./pipeline-graph.c:357 INFO pipe reset
[ 275067890.319784] ( 96.562492) c2 pipe 1.3 ......./pipeline-graph.c:333 WARN pipeline_comp_reset(): component is in active state, try to stop it
[ 275067916.101033] ( 25.781248) c2 pipe 1.3 ......./pipeline-graph.c:336 ERROR pipeline_comp_reset(): failed to recover
[ 275068354.330182] ( 438.229156) c0 ipc src/ipc/ipc3/handler.c:1605 INFO ipc: new cmd 0x60050000
--
[ 344515716.883092] ( 22.031250) c2 pipe 7.40 ......./pipeline-graph.c:357 INFO pipe reset
[ 344515813.810172] ( 96.927078) c2 pipe 1.3 ......./pipeline-graph.c:333 WARN pipeline_comp_reset(): component is in active state, try to stop it
[ 344515839.591421] ( 25.781248) c2 pipe 1.3 ......./pipeline-graph.c:336 ERROR pipeline_comp_reset(): failed to recover
[ 344516252.455987] ( 412.864563) c0 ipc src/ipc/ipc3/handler.c:1605 INFO ipc: new cmd 0x60050000
With #9139, this is still one of the two failures hitting several of our models every day in our stable 2.2 IPC3 test plan. Like clockwork.
#9135 seems to affect only NOCODEC configurations. #9139 not just.
Impact only on no_codec topology so correct at P3.
Maybe we can "reduce" these two NOCODEC topologies a little bit to silence the constant and distracting failure noise?
Maybe we can "reduce" these two NOCODEC topologies a little bit to silence the constant and distracting failure noise?
Yes - for stable 2.2 we should only be testing kernel updates on real user configurations. Lets disable and not delete nocodec, as it could be useful as a means to debug.
SKIP submitted:
- https://github.com/thesofproject/sof-test/pull/1229
This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.