sof
sof copied to clipboard
[BUG][ADLP] ipc tx timed out for 0x60060000(TRIG_PAUSE)when testing multiple-pause-resume on ADLP Chrome device
Describe the bug CI detected this issue in today's daily test ID:12812. but failed to reproduced it again. IPC timed out when TRIG_PAUSE and there're some XRUNs detected in DMA trace.
[ 4182.403813] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: pcm: trigger stream 99 dir 1 cmd 3
[ 4182.403847] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: ipc tx: 0x60060000: GLB_STREAM_MSG: TRIG_PAUSE
[ 4182.911354] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: ipc tx timed out for 0x60060000 (msg/reply size: 12/12)
[ 4182.911388] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: preventing DSP entering D3 state to preserve context
[ 4182.911396] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: ------------[ IPC dump start ]------------
[ 4182.911421] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: hda irq intsts 0x00000000 intlctl 0xc0000083 rirb 00
[ 4182.911434] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: dsp irq ppsts 0x00000000 adspis 0x00000000
[ 4182.911455] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: error: host status 0x00000000 dsp status 0x00000000 mask 0x00000003
[ 4182.911464] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: ------------[ IPC dump end ]------------
[ 4182.911473] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: ------------[ DSP dump start ]------------
[ 4182.911480] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: IPC timeout
[ 4182.911488] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: fw_state: SOF_FW_BOOT_COMPLETE (6)
[ 4182.911508] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: status: fw entered - code 00000005
[ 4182.911646] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: unexpected fault 0x00000000 trace 0x00004000
[ 4182.911658] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: ------------[ DSP dump end ]------------
DMA-trace
[ 2482386.099276] ( 23.229166) c1 dmic-dai 2.0 ....../intel/dmic/dmic.c:418 INFO dmic_stop(), dmic_active_fifos_mask = 0x1
[ 2482423.026357] ( 28.281248) c1 zll-schedule src/schedule/zephyr_ll.c:69 INFO task complete 0xbe093a00 pipe-task [ 2482394.745108] ( 8.645833) c0 hda-dma ..../intel/hda/hda-dma.c:948 ERROR hda_dma_link_check_xrun(): underrun detected
[ 2482443.234690] ( 20.208332) c1 zll-schedule src/schedule/zephyr_ll.c:71 INFO num_tasks 2 total_num_tasks 4
[ 2482462.661772] ( 19.427082) c1 zll-schedule src/schedule/zephyr_ll.c:69 INFO task complete 0x9e087808 dmic-work <59c87728-d8f9-42f6-b89d-5870a87b0e1e>
[ 2482479.745105] ( 17.083332) c1 zll-schedule src/schedule/zephyr_ll.c:71 INFO num_tasks 1 total_num_tasks 3
[ 2482500.213854] ( 20.468750) c1 zll-schedule ......../zephyr_domain.c:210 INFO zephyr_domain_unregister domain->type 1 domain->clk 4
[ 2491369.744752] ( 8869.531250) c0 ipc src/ipc/ipc3/handler.c:1605 INFO ipc: new cmd 0x60060000
[ 2491403.807251] ( 34.062500) c0 pipe 8.43 ....../pipeline-stream.c:270 INFO pipe trigger cmd 2
[ 2492383.703045] ( 979.895813) c0 zll-schedule src/schedule/zephyr_ll.c:69 INFO task complete 0xbe093f80 pipe-task
[ 2492402.817627] ( 19.114582) c0 zll-schedule src/schedule/zephyr_ll.c:71 INFO num_tasks 2 total_num_tasks 2
[ 2492844.119693] ( 441.302063) c0 ipc src/ipc/ipc3/handler.c:1605 INFO ipc: new cmd 0x80010000
[ 2492873.338442] ( 29.218748) c0 ipc src/ipc/ipc3/handler.c:595 INFO ipc: dai 3.3 -> config
[ 2492894.484275] ( 21.145832) c0 dai src/lib/dai.c:166 INFO dai_get type 3 index 3 new sref 2
[ 2492915.161357] ( 20.677082) c0 hda-dai 3.3 /drivers/intel/hda/hda.c:42 INFO hda_set_config(): channels 2 rate 48000
[ 2492935.421773] ( 20.260416) c0 dai src/lib/dai.c:190 INFO dai_put type 3 index 3 new sref 1
[ 2492955.890522] ( 20.468750) c0 ipc src/ipc/ipc3/dai.c:185 INFO ipc_comp_dai_config() dai type = 3 index = 3
[ 2492996.567604] ( 40.677082) c0 dai 8.42 src/ipc/ipc3/dai.c:279 INFO dai_config() dai type = 3 index = 3 dd 0x9e089680
[ 2493123.755099] ( 127.187492) c1 ipc src/ipc/ipc3/handler.c:595 INFO ipc: dai 3.3 -> config
[ 2493144.223848] ( 20.468750) c1 ipc src/ipc/ipc3/dai.c:185 INFO ipc_comp_dai_config() dai type = 3 index = 3
[ 2494861.255030] ( 1717.031128) c0 ipc src/ipc/ipc3/handler.c:1605 INFO ipc: new cmd 0x60070000
[ 2494895.213362] ( 33.958332) c0 pipe 8.43 ....../pipeline-stream.c:270 INFO pipe trigger cmd 8
[ 2494923.702944] ( 28.489582) c0 zll-schedule src/schedule/zephyr_ll.c:289 INFO task add 0xbe093f80 pipe-task priority 0 flags 0x0
[ 2528776.149515] ( 140.312500) c0 ipc src/ipc/ipc3/handler.c:1605 INFO ipc: new cmd 0x60070000
[ 2528870.784928] ( 94.635414) c1 pipe 10.51 ....../pipeline-stream.c:270 INFO pipe trigger cmd 8
[ 2528907.034927] ( 36.250000) c1 zll-schedule src/schedule/zephyr_ll.c:289 INFO task add 0xbe093a00 pipe-task
[ 2512452.452247] ( 17528.750000) c0 hda-dma ..../intel/hda/hda-dma.c:948 ERROR hda_dma_link_check_xrun(): underrun detected
[ 2514490.525083] ( 2038.072876) c0 hda-dma ..../intel/hda/hda-dma.c:948 ERROR hda_dma_link_check_xrun(): underrun detected
[ 2516491.983337] ( 2001.458252) c0 hda-dma ..../intel/hda/hda-dma.c:948 ERROR hda_dma_link_check_xrun(): underrun detected
[ 2528635.837021] ( 12143.853516) c0 hda-dma ..../intel/hda/hda-dma.c:948 ERROR hda_dma_link_check_xrun(): underrun detected priority 0 flags 0x0
[ 2528945.524509] ( 38.489582) c1 zll-schedule ......../zephyr_domain.c:180 INFO zephyr_domain_register domain->type 1 domain->clk 4 domain->ticks_per_ms 38400 period 1000
To Reproduce
~/sof-test/test-case/multiple-pause-resume.sh -r 50
Reproduction Rate This issue is reproducible. I reproduced this issue once in 5 rounds of tests.
Environment Kernel Branch: topic/sof-dev Kernel Commit: 0b48f05d38c9
SOF Branch: main SOF Commit: 9611d126ba5d
Zephyr Commit: 70dbf7e695d2
TPLG:sof-adl-max98390-rt5682.tplg
Platform: ADLP Chrome device Gimble-I2S with Zephyr.
@keqiaozhang can you turn on Zephyr lock debugging and try again (Zephyr KCONFIG). @lyakh I assume this is correct method to enable Zephyr lock debug ?
Before trying Zephyr lock debugging. I did more tests this afternoon.
First this issue is reproducible, I tested with https://github.com/thesofproject/sof/commit/9611d126ba5d4939c8a5a841e10663aac2b89e41 and hit this issue twice in 10 rounds of multiple-pause-resume.sh -r 50
.
Then I tested this cases ten times with the old commit https://github.com/thesofproject/sof/commit/f3ec2e590a1e99c6574883bac3095d01e52f473c, no reproductions.
It seems that this issue was introduced by https://github.com/thesofproject/sof/commit/9611d126ba5d4939c8a5a841e10663aac2b89e41.
But this does not rule out that this issue has nothing to do with zephyr, because zephyr has been updated accordingly. I will check the last stable zephyr commit with sof https://github.com/thesofproject/sof/commit/f3ec2e590a1e99c6574883bac3095d01e52f473c next.
It seems that this issue was introduced by 9611d12.
This is an optimization that make the pipeline run faster so we could be exposing a race.
@andrula-song can you fix the following
source = buffer_acquire(source);
sink = buffer_acquire(sink);
comp_get_copy_limits_frame_aligned(source, sink, cl);
buffer_release(source); <<<<<< sink
buffer_release(sink); <<<<<<<< source
}
Locks should be acquired and released symmetrically. @keqiaozhang can you try again with the suggested change and then again with the lock debug.
@lgirdwood @keqiaozhang Use new PR https://github.com/thesofproject/sof/pull/5857 to track this bug.
@lgirdwood and @andrula-song
Confirmed that PR https://github.com/thesofproject/sof/pull/5857 can fix this issue. 15 rounds of test passed(~/sof-test/test-case/multiple-pause-resume.sh -r 50
). It took about 4 and a half hours.
@keqiaozhang can you turn on Zephyr lock debugging and try again (Zephyr KCONFIG). @lyakh I assume this is correct method to enable Zephyr lock debug ?
@keqiaozhang @lgirdwood yes, just enable CONFIG_ASSERT
then CONFIG_SPIN_VALIDATE
will be enabled automatically too
@lgirdwood and @andrula-song
Confirmed that PR #5857 can fix this issue. 15 rounds of test passed(
~/sof-test/test-case/multiple-pause-resume.sh -r 50
). It took about 4 and a half hours.
@lgirdwood @andrula-song Interesting. Locking and unlocking only takes place if the object is shared, which in this case should mean, if the buffer is connecting two pipelines, running on different cores. And AFAICS sof-tgl-sdw-max98373-rt5682.m4 has everything on core 0. So, this must be something different...
Bad news, this issue still can be reproduced with the tip main branch. https://github.com/thesofproject/sof/commit/a7b2b29cc649bee7d4b86c686db98632346c497b. It seems that https://github.com/thesofproject/sof/pull/5857 doesn't really fix the problem.
Bad news, this issue still can be reproduced with the tip main branch. a7b2b29. It seems that #5857 doesn't really fix the problem.
exactly. As my previous comment says - this doesn't appear related to me.
I'd propose a P2 priority just to help order bugfixing work. I tried to locally reproduce this and it seems this takes multiple hours to trigger, so it seems P2. Please increase/decrease based on rate of occurence.
Found another IPC timed out issue for 0x60040000 (TRIG_START) when testing multiple-pipeline-playback. Test report ID:http://sof-ci.sh.intel.com/#/result/planresultdetail/13207 [Edit]: I tried to reproduce it manually, 1000+ iterations passed, no reproductions so far.
dmesg (dump from journalctl):
Jun 08 23:49:27 sh-adlp-gmb-i2s-02 kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: ipc tx: 0x60010000: GLB_STREAM_MSG: PCM_PARAMS
Jun 08 23:49:27 sh-adlp-gmb-i2s-02 kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: FW Poll Status: reg[0x1a0]=0x2034001e successful
Jun 08 23:49:27 sh-adlp-gmb-i2s-02 kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: pcm: stream dir 0, posn mailbox offset is 791136
Jun 08 23:49:27 sh-adlp-gmb-i2s-02 kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: ipc tx: 0x60040000: GLB_STREAM_MSG: TRIG_START
Jun 08 23:49:27 sh-adlp-gmb-i2s-02 kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: ipc tx timed out for 0x60040000 (msg/reply size: 12/12)
Jun 08 23:49:27 sh-adlp-gmb-i2s-02 kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: preventing DSP entering D3 state to preserve context
Jun 08 23:49:27 sh-adlp-gmb-i2s-02 kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: ------------[ IPC dump start ]------------
Jun 08 23:49:27 sh-adlp-gmb-i2s-02 kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: hda irq intsts 0xc0000380 intlctl 0x40000381 rirb 01
Jun 08 23:49:27 sh-adlp-gmb-i2s-02 kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: dsp irq ppsts 0x80000000 adspis 0x00000001
Jun 08 23:49:27 sh-adlp-gmb-i2s-02 kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: error: host status 0x80000000 dsp status 0x90020000 mask 0x00000001
Jun 08 23:49:27 sh-adlp-gmb-i2s-02 kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: ------------[ IPC dump end ]------------
Jun 08 23:49:27 sh-adlp-gmb-i2s-02 kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: ------------[ DSP dump start ]------------
Jun 08 23:49:27 sh-adlp-gmb-i2s-02 kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: IPC timeout
Jun 08 23:49:27 sh-adlp-gmb-i2s-02 kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: fw_state: SOF_FW_BOOT_COMPLETE (6)
Jun 08 23:49:27 sh-adlp-gmb-i2s-02 kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: status: fw entered - code 00000005
Jun 08 23:49:27 sh-adlp-gmb-i2s-02 kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: unexpected fault 0x00000000 trace 0x00004000
Jun 08 23:49:27 sh-adlp-gmb-i2s-02 kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: ------------[ DSP dump end ]------------
Jun 08 23:49:27 sh-adlp-gmb-i2s-02 kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: ASoC: error at soc_component_trigger on 0000:00:1f.3: -110
Jun 08 23:49:27 sh-adlp-gmb-i2s-02 kernel: HDMI1: ASoC: trigger FE cmd: 1 failed: -110
Jun 08 23:49:27 sh-adlp-gmb-i2s-02 kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: pcm: trigger stream 2 dir 0 cmd 0
Jun 08 23:49:27 sh-adlp-gmb-i2s-02 kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: trace IO error
logger
[ 331946747.851288] ( 49.895832) c0 ssp-dai 1.1 /drivers/intel/ssp/ssp.c:61 WARN ssp_empty_tx_fifo() warning: timeout
[ 331946768.059621] ( 20.208332) c0 ssp-dai 1.1 /drivers/intel/ssp/ssp.c:1043 INFO ssp_stop(), TX stop
[ 331946789.517953] ( 21.458332) c0 ssp-dai 1.1 /drivers/intel/ssp/ssp.c:1056 INFO ssp_stop(): SSE clear SSP1
[ 331946809.830452] ( 20.312500) c0 ssp-dai 1.1 /drivers/intel/ssp/ssp.c:903 INFO ssp_post_stop releasing BCLK clocks for SSP1...
[ 331946831.549201] ( 21.718750) c0 ssp-dai 1.1 /drivers/intel/ssp/ssp.c:908 INFO ssp_post_stop releasing MCLK clocks for SSP1...
[ 331946855.090867] ( 23.541666) c0 dw-dma src/drivers/dw/dma.c:413 INFO dw_dma_stop(): dma 1 channel 0 stop
[ 331946889.570032] ( 34.479164) c0 zll-schedule src/schedule/zephyr_ll.c:69 INFO task complete 0xbe093a80 pipe-task
[ 331946697.955457] ( 1922.083252) c0 wait src/lib/wait.c:53 ERROR poll timeout reg 488008 mask 63 val 0 us 937
[ 331946908.424198] ( 18.854166) c0 zll-schedule src/schedule/zephyr_ll.c:71 INFO num_tasks 3 total_num_tasks 3
[ 331947595.507504] ( 687.083313) c0 ipc src/ipc/ipc3/handler.c:1605 INFO ipc: new cmd 0x60050000
[ 331947638.945003] ( 43.437500) c0 pipe 2.13 ....../pipeline-stream.c:270 INFO pipe trigger cmd 0
[ 331948757.069958] ( 1118.125000) c0 ssp-dai 1.0 /drivers/intel/ssp/ssp.c:1081 INFO ssp_trigger() cmd 0
[ 331950722.382380] ( 44.218750) c0 ssp-dai 1.0 /drivers/intel/ssp/ssp.c:61 WARN ssp_empty_tx_fifo() warning: timeout
[ 331950742.174046] ( 19.791666) c0 ssp-dai 1.0 /drivers/intel/ssp/ssp.c:1043 INFO ssp_stop(), TX stop
[ 331950764.257378] ( 22.083332) c0 ssp-dai 1.0 /drivers/intel/ssp/ssp.c:908 INFO ssp_post_stop releasing MCLK clocks for SSP0...
[ 331950787.955294] ( 23.697916) c0 dw-dma src/drivers/dw/dma.c:413 INFO dw_dma_stop(): dma 0 channel 0 stop
[ 331950820.924043] ( 32.968750) c0 zll-schedule src/schedule/zephyr_ll.c:69 INFO task complete 0xbe094880 pipe-task
[ 331950678.163632] ( 1921.093628) c0 wait src/lib/wait.c:53 ERROR poll timeout reg 487496 mask 63 val 0 us 937
@keqiaozhang It would be good to get output of "sudo cat /sys/kernel/debug/sof/etrace" (or run "sudo cavstool.py -l" during the test). The sof-logger output doesn't show what happens around the error and it is possible the DSP hits a fatal error (and sof-logger stops working when this happens). The "etrace" file would show what has happened.
Here are the logs I captured for IPC timed out 0x60040000 (TRIG_START):
etrace $ sudo cat /sys/kernel/debug/sof/etrace
▒▒▒[00000000] <inf> main: SOF on intel_adsp_cavs25
555507 zephyr_ll_scheduler_init(): unsupported domain 2
[00000000] <inf> main: SOF initialized
572775 zephyr_ll_scheduler_init(): unsupported domain 2
592643: SHM: FW ABI 0x3016001 DBG ABI 0x5003000 tags SOF:v2.0-rc1-955-g21d22a6d4ea6 zephyr:8f16b0599955 src hash 0xfaeb6234 (ldc hash 0xfaeb6234)
1039814 zephyr_ll_scheduler_init(): unsupported domain 2
438237263 poll timeout reg 487496 mask 63 val 0 us 937
438312285 poll timeout reg 488008 mask 63 val 0 us 937
722157920 poll timeout reg 488008 mask 63 val 0 us 937
722199908 hda_dma_link_check_xrun(): underrun detected
722272850 poll timeout reg 487496 mask 63 val 0 us 937
722277480 hda_dma_link_check_xrun(): underrun detected
722314584 hda_dma_link_check_xrun(): underrun detected
991056265 poll timeout reg 488008 mask 63 val 0 us 937
991061781 hda_dma_link_check_xrun(): underrun detected
991096536 hda_dma_link_check_xrun(): underrun detected
991135292 poll timeout reg 487496 mask 63 val 0 us 937
991139781 hda_dma_link_check_xrun(): underrun detected
991143045 hda_dma_link_check_xrun(): underrun detected
1259860593 poll timeout reg 487496 mask 63 val 0 us 937
1259935526 poll timeout reg 488008 mask 63 val 0 us 937
1528683893 poll timeout reg 488008 mask 63 val 0 us 937
1528760350 poll timeout reg 487496 mask 63 val 0 us 937
1797288526 poll timeout reg 487496 mask 63 val 0 us 937
1797401818 poll timeout reg 488008 mask 63 val 0 us 937
2067960138 poll timeout reg 487496 mask 63 val 0 us 937
2068035093 poll timeout reg 488008 mask 63 val 0 us 937
2068040470 hda_dma_link_check_xrun(): underrun detected
2337518876 poll timeout reg 487496 mask 63 val 0 us 937
2337593547 poll timeout reg 488008 mask 63 val 0 us 937
2606458224 poll timeout reg 488008 mask 63 val 0 us 937
2606470194 hda_dma_link_check_xrun(): underrun detected
2606534785 poll timeout reg 487496 mask 63 val 0 us 937
2875238105 poll timeout reg 487496 mask 63 val 0 us 937
2875350630 poll timeout reg 488008 mask 63 val 0 us 937
3143832475 poll timeout reg 488008 mask 63 val 0 us 937
3143837974 hda_dma_link_check_xrun(): underrun detected
3143948081 poll timeout reg 487496 mask 63 val 0 us 937
3413246663 poll timeout reg 487496 mask 63 val 0 us 937
3413397473 poll timeout reg 488008 mask 63 val 0 us 937
3681806496 poll timeout reg 488008 mask 63 val 0 us 937
3681922122 poll timeout reg 487496 mask 63 val 0 us 937
3950713626 poll timeout reg 487496 mask 63 val 0 us 937
3950865390 poll timeout reg 488008 mask 63 val 0 us 937
4219879558 poll timeout reg 488008 mask 63 val 0 us 937
4219955954 poll timeout reg 487496 mask 63 val 0 us 937
4488633176 poll timeout reg 487496 mask 63 val 0 us 937
4488746579 poll timeout reg 488008 mask 63 val 0 us 937
4757640492 poll timeout reg 487496 mask 63 val 0 us 937
4757793088 poll timeout reg 488008 mask 63 val 0 us 937
5026394453 poll timeout reg 487496 mask 63 val 0 us 937
5026507795 poll timeout reg 488008 mask 63 val 0 us 937
5295676779 poll timeout reg 487496 mask 63 val 0 us 937
5295751683 poll timeout reg 488008 mask 63 val 0 us 937
5295757066 hda_dma_link_check_xrun(): underrun detected
5295759552 hda_dma_link_check_xrun(): underrun detected
5564155890 poll timeout reg 487496 mask 63 val 0 us 937
5564305934 poll timeout reg 488008 mask 63 val 0 us 937
5832757954 poll timeout reg 488008 mask 63 val 0 us 937
5832834305 poll timeout reg 487496 mask 63 val 0 us 937
5832839041 hda_dma_link_check_xrun(): underrun detected
5832841534 hda_dma_link_check_xrun(): underrun detected
5832875752 hda_dma_link_check_xrun(): underrun detected
6101920912 poll timeout reg 487496 mask 63 val 0 us 937
6101994228 poll timeout reg 488008 mask 63 val 0 us 937
6102001294 hda_dma_link_check_xrun(): underrun detected
6102036361 hda_dma_link_check_xrun(): underrun detected
6370945139 poll timeout reg 487496 mask 63 val 0 us 937
6371020046 poll timeout reg 488008 mask 63 val 0 us 937
6639079648 poll timeout reg 488008 mask 63 val 0 us 937
6639194355 poll timeout reg 487496 mask 63 val 0 us 937
6908972178 poll timeout reg 487496 mask 63 val 0 us 937
6909083906 poll timeout reg 488008 mask 63 val 0 us 937
6909089282 hda_dma_link_check_xrun(): underrun detected
6909091766 hda_dma_link_check_xrun(): underrun detected
6909124058 hda_dma_link_check_xrun(): underrun detected
7178377356 poll timeout reg 487496 mask 63 val 0 us 937
7178452293 poll timeout reg 488008 mask 63 val 0 us 937
7447737467 poll timeout reg 488008 mask 63 val 0 us 937
7447750588 hda_dma_link_check_xrun(): underrun detected
7447813951 poll timeout reg 487496 mask 63 val 0 us 937
7716915847 poll timeout reg 487496 mask 63 val 0 us 937
7716990729 poll timeout reg 488008 mask 63 val 0 us 937
7985559356 poll timeout reg 488008 mask 63 val 0 us 937
7985712637 poll timeout reg 487496 mask 63 val 0 us 937
8254499175 poll timeout reg 488008 mask 63 val 0 us 937
8254614070 poll timeout reg 487496 mask 63 val 0 us 937
8523977062 poll timeout reg 487496 mask 63 val 0 us 937
8524090422 poll timeout reg 488008 mask 63 val 0 us 937
8793218207 poll timeout reg 488008 mask 63 val 0 us 937
8793223695 hda_dma_link_check_xrun(): underrun detected
8793258457 hda_dma_link_check_xrun(): underrun detected
8793297141 poll timeout reg 487496 mask 63 val 0 us 937
8793301656 hda_dma_link_check_xrun(): underrun detected
8793304896 hda_dma_link_check_xrun(): underrun detected
8793335268 hda_dma_link_check_xrun(): underrun detected
8866505391 poll timeout reg 488008 mask 63 val 0 us 937
9062465442 poll timeout reg 487496 mask 63 val 0 us 937
9062470143 hda_dma_link_check_xrun(): underrun detected
9062472624 hda_dma_link_check_xrun(): underrun detected
9062506212 hda_dma_link_check_xrun(): underrun detected
cavstool
sudo cavstool.py -l
INFO:cavs-fw:Ignore the interface sit0 which is not activated.
INFO:cavs-fw:Use interface enx000ec67984c8, IP address: 10.239.185.58
INFO:cavs-fw:Existing driver "snd_sof_pci_intel_tgl" found
INFO:cavs-fw:Mapped PCI bar 0 of length 16384 bytes.
INFO:cavs-fw:Selected output stream 7 (GCAP = 0x1009701)
INFO:cavs-fw:Mapped PCI bar 4 of length 1048576 bytes.
INFO:cavs-fw:Detected cAVS 1.8+ hardware
INFO:cavs-fw:Waiting for firmware handoff, FW_STATUS = 0x5
[00000000] <inf> main: SOF on intel_adsp_cavs25
555507 zephyr_ll_scheduler_init(): unsupported domain 2
[00000000] <inf> main: SOF initialized
572775 zephyr_ll_scheduler_init(): unsupported domain 2
592643: SHM: FW ABI 0x3016001 DBG ABI 0x5003000 tags SOF:v2.0-rc1-955-g21d22a6d4ea6 zephyr:8f16b0599955 src hash 0xfaeb6234 (ldc hash 0xfaeb6234)
1039814 zephyr_ll_scheduler_init(): unsupported domain 2
438237263 poll timeout reg 487496 mask 63 val 0 us 937
438312285 poll timeout reg 488008 mask 63 val 0 us 937
722157920 poll timeout reg 488008 mask 63 val 0 us 937
722199908 hda_dma_link_check_xrun(): underrun detected
722272850 poll timeout reg 487496 mask 63 val 0 us 937
722277480 hda_dma_link_check_xrun(): underrun detected
722314584 hda_dma_link_check_xrun(): underrun detected
991056265 poll timeout reg 488008 mask 63 val 0 us 937
991061781 hda_dma_link_check_xrun(): underrun detected
991096536 hda_dma_link_check_xrun(): underrun detected
991135292 poll timeout reg 487496 mask 63 val 0 us 937
991139781 hda_dma_link_check_xrun(): underrun detected
991143045 hda_dma_link_check_xrun(): underrun detected
1259860593 poll timeout reg 487496 mask 63 val 0 us 937
1259935526 poll timeout reg 488008 mask 63 val 0 us 937
1528683893 poll timeout reg 488008 mask 63 val 0 us 937
1528760350 poll timeout reg 487496 mask 63 val 0 us 937
1797288526 poll timeout reg 487496 mask 63 val 0 us 937
1797401818 poll timeout reg 488008 mask 63 val 0 us 937
2067960138 poll timeout reg 487496 mask 63 val 0 us 937
2068035093 poll timeout reg 488008 mask 63 val 0 us 937
2068040470 hda_dma_link_check_xrun(): underrun detected
2337518876 poll timeout reg 487496 mask 63 val 0 us 937
2337593547 poll timeout reg 488008 mask 63 val 0 us 937
2606458224 poll timeout reg 488008 mask 63 val 0 us 937
2606470194 hda_dma_link_check_xrun(): underrun detected
2606534785 poll timeout reg 487496 mask 63 val 0 us 937
2875238105 poll timeout reg 487496 mask 63 val 0 us 937
2875350630 poll timeout reg 488008 mask 63 val 0 us 937
3143832475 poll timeout reg 488008 mask 63 val 0 us 937
3143837974 hda_dma_link_check_xrun(): underrun detected
3143948081 poll timeout reg 487496 mask 63 val 0 us 937
3413246663 poll timeout reg 487496 mask 63 val 0 us 937
3413397473 poll timeout reg 488008 mask 63 val 0 us 937
3681806496 poll timeout reg 488008 mask 63 val 0 us 937
3681922122 poll timeout reg 487496 mask 63 val 0 us 937
3950713626 poll timeout reg 487496 mask 63 val 0 us 937
3950865390 poll timeout reg 488008 mask 63 val 0 us 937
4219879558 poll timeout reg 488008 mask 63 val 0 us 937
4219955954 poll timeout reg 487496 mask 63 val 0 us 937
4488633176 poll timeout reg 487496 mask 63 val 0 us 937
4488746579 poll timeout reg 488008 mask 63 val 0 us 937
4757640492 poll timeout reg 487496 mask 63 val 0 us 937
4757793088 poll timeout reg 488008 mask 63 val 0 us 937
5026394453 poll timeout reg 487496 mask 63 val 0 us 937
5026507795 poll timeout reg 488008 mask 63 val 0 us 937
5295676779 poll timeout reg 487496 mask 63 val 0 us 937
5295751683 poll timeout reg 488008 mask 63 val 0 us 937
5295757066 hda_dma_link_check_xrun(): underrun detected
5295759552 hda_dma_link_check_xrun(): underrun detected
5564155890 poll timeout reg 487496 mask 63 val 0 us 937
5564305934 poll timeout reg 488008 mask 63 val 0 us 937
5832757954 poll timeout reg 488008 mask 63 val 0 us 937
5832834305 poll timeout reg 487496 mask 63 val 0 us 937
5832839041 hda_dma_link_check_xrun(): underrun detected
5832841534 hda_dma_link_check_xrun(): underrun detected
5832875752 hda_dma_link_check_xrun(): underrun detected
6101920912 poll timeout reg 487496 mask 63 val 0 us 937
6101994228 poll timeout reg 488008 mask 63 val 0 us 937
dmesg
[ 295.393836] sof-audio-pci-intel-tgl 0000:00:1f.3: DAI config 0 matches pcm hw params
[ 295.393837] sof-audio-pci-intel-tgl 0000:00:1f.3: DAI config 0 matches pcm hw params
[ 295.393838] sof-audio-pci-intel-tgl 0000:00:1f.3: rate_min: 48000 rate_max: 48000
[ 295.393838] sof-audio-pci-intel-tgl 0000:00:1f.3: channels_min: 2 channels_max: 2
[ 295.886571] sof-audio-pci-intel-tgl 0000:00:1f.3: ipc tx timed out for 0x60040000 (msg/reply size: 12/12)
[ 295.886588] sof-audio-pci-intel-tgl 0000:00:1f.3: preventing DSP entering D3 state to preserve context
[ 295.886593] sof-audio-pci-intel-tgl 0000:00:1f.3: ------------[ IPC dump start ]------------
[ 295.886612] sof-audio-pci-intel-tgl 0000:00:1f.3: hda irq intsts 0x00000000 intlctl 0xc0000081 rirb 00
[ 295.886619] sof-audio-pci-intel-tgl 0000:00:1f.3: dsp irq ppsts 0x00000000 adspis 0x00000000
[ 295.886635] sof-audio-pci-intel-tgl 0000:00:1f.3: error: host status 0x00000000 dsp status 0x00000000 mask 0x00000003
[ 295.886640] sof-audio-pci-intel-tgl 0000:00:1f.3: ------------[ IPC dump end ]------------
[ 295.886644] sof-audio-pci-intel-tgl 0000:00:1f.3: ------------[ DSP dump start ]------------
[ 295.886647] sof-audio-pci-intel-tgl 0000:00:1f.3: IPC timeout
[ 295.886651] sof-audio-pci-intel-tgl 0000:00:1f.3: fw_state: SOF_FW_BOOT_COMPLETE (6)
[ 295.886670] sof-audio-pci-intel-tgl 0000:00:1f.3: status: fw entered - code 00000005
[ 295.886855] sof-audio-pci-intel-tgl 0000:00:1f.3: unexpected fault 0x00000000 trace 0x00004000
[ 295.886862] sof-audio-pci-intel-tgl 0000:00:1f.3: ------------[ DSP dump end ]------------
[ 295.886905] sof-audio-pci-intel-tgl 0000:00:1f.3: ASoC: error at soc_component_trigger on 0000:00:1f.3: -110
[ 295.886916] Speakers: ASoC: trigger FE cmd: 1 failed: -110
[ 295.886933] sof-audio-pci-intel-tgl 0000:00:1f.3: pcm: trigger stream 0 dir 0 cmd 0
[ 295.886950] sof-audio-pci-intel-tgl 0000:00:1f.3: ipc tx: 0x60050000: GLB_STREAM_MSG: TRIG_STOP
[ 295.887009] sof-audio-pci-intel-tgl 0000:00:1f.3: trace IO error
Thanks @keqiaozhang . So there's no fatal errors from the OS in etrace, so this is something in the audio/IPC logic.
(or run "sudo cavstool.py -l" during the test).
There is https://github.com/thesofproject/sof-test/pull/897 but I just confirmed cavstool.py
is not compatible with D3. Off-topic here, discuss in https://github.com/thesofproject/sof-test/pull/897
Another IPC timed out issue for 0x80010000 (CONFIG) when testing multiple-pause-resume on ADLP_GMB_I2S is found in daily test (ID: 13547):
dmesg
[ 3813.513459] kernel: snd_sof:ipc3_log_header: sof-audio-pci-intel-tgl 0000:00:1f.3: ipc tx: 0x60060000: GLB_STREAM_MSG: TRIG_PAUSE
[ 3813.514176] kernel: snd_sof_intel_hda_common:hda_dsp_stream_trigger: sof-audio-pci-intel-tgl 0000:00:1f.3: FW Poll Status: reg[0x160]=0x20140000 successful
[ 3813.514183] kernel: snd_sof_intel_hda_common:ipc3_hda_dai_trigger: sof-audio-pci-intel-tgl 0000:00:1f.3: cmd=3 dai iDisp1 Pin direction 0
[ 3813.514199] kernel: snd_sof:ipc3_log_header: sof-audio-pci-intel-tgl 0000:00:1f.3: ipc tx: 0x80010000: GLB_DAI_MSG: CONFIG
[ 3813.536598] kernel: snd_sof:sof_pcm_trigger: sof-audio-pci-intel-tgl 0000:00:1f.3: pcm: trigger stream 99 dir 1 cmd 3
[ 3814.016155] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: ipc tx timed out for 0x80010000 (msg/reply size: 216/12)
[ 3814.016178] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: preventing DSP entering D3 state to preserve context
[ 3814.016187] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: ------------[ IPC dump start ]------------
[ 3814.016209] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: hda irq intsts 0x00000000 intlctl 0xc0000003 rirb 00
[ 3814.016219] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: dsp irq ppsts 0x00000000 adspis 0x00000000
[ 3814.016237] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: error: host status 0x00000000 dsp status 0x00000000 mask 0x00000003
[ 3814.016244] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: ------------[ IPC dump end ]------------
[ 3814.016248] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: ------------[ DSP dump start ]------------
[ 3814.016252] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: IPC timeout
[ 3814.016257] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: fw_state: SOF_FW_BOOT_COMPLETE (6)
[ 3814.016273] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: status: fw entered - code 00000005
[ 3814.016409] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: unexpected fault 0x00000000 trace 0x00004000
[ 3814.016416] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: ------------[ DSP dump end ]------------
[ 3814.016448] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: Failed to set dai config for iDisp1
[ 3814.016454] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: hda_dai_config_pause_push_ipc: DAI config failed for widget HDA0.OUT
[ 3814.016462] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: ASoC: error at soc_dai_trigger on iDisp1 Pin: -110
[ 3814.016477] kernel: HDMI1: ASoC: dpcm_be_dai_trigger() failed at iDisp1 (-110)
[ 3814.016484] kernel: HDMI1: ASoC: trigger FE cmd: 3 failed: -110
[ 3814.016503] kernel: snd_sof:sof_pcm_trigger: sof-audio-pci-intel-tgl 0000:00:1f.3: pcm: trigger stream 2 dir 0 cmd 4
[ 3814.016517] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: trace IO error
CI detected this issue on ADLP_RVP_NOCODEC: http://sof-ci.sh.intel.com/#/result/planresultdetail/14549?model=ADLP_RVP_NOCODEC&testcase=multiple-pause-resume-50 etrace
[ 377869510.401481] ( 10116008.000000) c0 dma-trace src/lib/wait.c:53 ERROR poll timeout reg 488520 mask 63 val 0 us 937
[ 381653734.105277] ( 3784223.750000) c2 dma-trace src/drivers/dw/dma.c:1128 ERROR dw_dma_get_data_size(): xrun detected
[ 381653760.667775] ( 26.562498) c2 dma-trace src/audio/dai-legacy.c:905 ERROR dai_report_xrun(): underrun due to no data available
[ 381653782.126108] ( 21.458332) c2 dma-trace src/audio/dai-legacy.c:787 ERROR comp_underrun(): dev->comp.id = 2, source->avail = 384, copy_bytes = 0
[ 391953610.987663] ( 10299829.000000) c0 dma-trace src/drivers/dw/dma.c:1128 ERROR dw_dma_get_data_size(): xrun detected
[ 391953641.300162] ( 30.312498) c0 dma-trace src/audio/dai-legacy.c:905 ERROR dai_report_xrun(): underrun due to no data available
[ 391953667.914744] ( 26.614582) c0 dma-trace src/audio/dai-legacy.c:787 ERROR comp_underrun(): dev->comp.id = 26, source->avail = 0, copy_bytes = 0
[ 391953749.685574] ( 81.770828) c0 dma-trace ....../pipeline-stream.c:143 ERROR pipeline_copy(): ret = -61, start->comp.id = 26, dir = 1
[ 391953777.966823] ( 28.281248) c0 dma-trace ..../pipeline-schedule.c:208 ERROR pipeline_task(): xrun recovery failed! pipeline is stopped.
dmesg
[ 3735.078933] kernel: snd_sof:sof_pcm_trigger: sof-audio-pci-intel-tgl 0000:00:1f.3: pcm: trigger stream 3 dir 0 cmd 3
[ 3735.078973] kernel: snd_sof:ipc3_log_header: sof-audio-pci-intel-tgl 0000:00:1f.3: ipc tx: 0x60060000: GLB_STREAM_MSG: TRIG_PAUSE
[ 3735.150165] kernel: snd_sof:sof_pcm_trigger: sof-audio-pci-intel-tgl 0000:00:1f.3: pcm: trigger stream 2 dir 0 cmd 3
[ 3735.586295] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: ipc tx timed out for 0x60060000 (msg/reply size: 12/12)
[ 3735.586322] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: preventing DSP entering D3 state to preserve context
[ 3735.586331] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: ------------[ IPC dump start ]------------
[ 3735.586349] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: hda irq intsts 0x00000000 intlctl 0xc0000181 rirb 00
[ 3735.586361] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: dsp irq ppsts 0x00000000 adspis 0x00000000
[ 3735.586376] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: error: host status 0x00000000 dsp status 0x00000000 mask 0x00000003
[ 3735.586385] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: ------------[ IPC dump end ]------------
[ 3735.586391] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: ------------[ DSP dump start ]------------
[ 3735.586396] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: IPC timeout
[ 3735.586401] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: fw_state: SOF_FW_BOOT_COMPLETE (6)
[ 3735.586414] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: 0x00000005: module: ROM, state: FW_ENTERED, running
[ 3735.586557] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: unexpected fault 0x00000000 trace 0x00004000
[ 3735.586566] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: ------------[ DSP dump end ]------------
[ 3735.586639] kernel: snd_sof_intel_hda_common:hda_dsp_stream_trigger: sof-audio-pci-intel-tgl 0000:00:1f.3: FW Poll Status: reg[0x180]=0x20240000 successful
[ 3735.586660] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: ASoC: error at soc_component_trigger on 0000:00:1f.3: -110
[ 3735.586676] kernel: Port0 Deep Buffer: ASoC: trigger FE cmd: 3 failed: -110
CI detected this issue on ADLP_RVP_NOCODEC: http://sof-ci.sh.intel.com/#/result/planresultdetail/14549?model=ADLP_RVP_NOCODEC&testcase=multiple-pause-resume-50
A similar failure on the same platform again in daily test Nr. 14634
Found another IPC timed out issue for 0x60040000 (TRIG_START) when testing multiple-pipeline-playback. Test report ID:http://sof-ci.sh.intel.com/#/result/planresultdetail/13207
@keqiaozhang sorry, I'm looking at 13207 and not seeing those IPC timeouts that you pasted here, are you sure it wasn't a different test? In that test I do see a multiple-pipeline-playback TIMEOUT failure on ADLP_GMB_I2S_ZEPHYR, but no IPC timeouts. However, there's also a check-sof-logger failure there with an IPC timeout for DMA_PARAMS_EXT, i.e. a failure to load the firmware.
@fredoh9 have you seen this issue in recent tests? If no, I think we can close this one.
For past month I couldn't find same issue with stable-v2.2 branch.