linux icon indicating copy to clipboard operation
linux copied to clipboard

[BUG] delayed mtrace collection

Open aiChaoSONG opened this issue 1 year ago • 8 comments

Some mtrace in the test case are expanded to next test case, this is spotted in daily test.

for example, in internal daily test 28474, on any test config(model), the timestamp in mtrace for check-playback-all-formats.sh test case is started from 0, but for its next test case (check-capture-all-formats.sh), the timestamp is the continuation of the previous test case, which is unexpected.

This issue is also found in PR tests.

End of mtrace for check-playback-all-formats.sh on MTLP_RVP_SDW

[    9.588290] <inf> dma: dma_put: dma_put(), dma = 0x400ed670, sref = 0
[    9.588923] <inf> ipc: ipc_cmd: rx	: 0x12010000|0x0
[    9.589055] <inf> dma: dma_put: dma_put(), dma = 0x400ed580, sref = Terminated

Start of mtrace for check-capture-all-formats.sh on MTLP_RVP_SDW

01b0U
[    9.583648] <inf> ll_schedule: zephyr_ll_task_done: num_tasks 1 total_num_tasks 1
[    9.583660] <inf> ll_schedule: zephyr_domain_unregister: zephyr_domain_unregister domain->type 1 domain->clk 3
[    9.586620] <inf> ipc: ipc_cmd: rx	: 0x13000002|0x1
[    9.586636] <inf> pipe: pipeline_trigger: pipe:1 0x0 pipe trigger cmd 0
[    9.586658] <inf> dma_intel_adsp_gpdma: intel_adsp_gpdma_power_off: intel_adsp_gpdma_power_off: dma dma@7c000 power off
[    9.586753] <inf> pipe: pipeline_trigger: pipe:0 0x0 pipe trigger cmd 0
[    9.586788] <wrn> copier: copier_comp_trigger: comp:0 0x40000 dai is not ready
[    9.587661] <inf> ipc: ipc_cmd: rx	: 0x46000002|0x3
[    9.588271] <inf> ipc: ipc_cmd: rx	: 0x12000000|0x0
[    9.588290] <inf> dma: dma_put: dma_put(), dma = 0x400ed670, sref = 0
[    9.588923] <inf> ipc: ipc_cmd: rx	: 0x12010000|0x0
[    9.589055] <inf> dma: dma_put: dma_put(), dma = 0x400ed580, sref = 0
[   13.314338] <inf> ipc: ipc_cmd: rx	: 0x44000000|0x3060004c
[   13.314771] <inf> ipc: ipc_cmd: rx	: 0x47000000|0x0
[    0.142388] <inf> ipc: ipc_cmd: rx	: 0x44000000|0x31400008
[    0.142955] <inf> ipc: ipc_cmd: rx	: 0x44000000|0x3060004c

cc:

  • https://github.com/thesofproject/sof/issues/5521
  • https://github.com/thesofproject/sof-test/issues/1046

aiChaoSONG avatar Jul 03 '23 08:07 aiChaoSONG

Thanks @aiChaoSONG good catch.

mtrace-reader.py is stopped and restarted between each test and redirected to a new log file every time. So at first sight this does not look like a bug in sof-test but an issue in the logging system itself.

An unsolved mtrace flushing issue was mentioned in:

  • https://github.com/thesofproject/sof/pull/7726

@kv2019i, @plbossart , @lyakh , @RanderWang any insight?

marc-hb avatar Jul 05 '23 15:07 marc-hb

@marc-hb The timestamp is generated by Zephyr OS. If the DSP is not powered off between test cases, the timestamp will continue and will not be reset to zero. So this is expected behaviour.

kv2019i avatar Jul 12 '23 09:07 kv2019i

I suspect there is one/many of following happening:

  • DSP crashed in a previous test case and kernel is running with CONFIG_SND_SOC_SOF_DEBUG_RETAIN_DSP_CONTEXT=y so DSP is not powered off anymore (until next system reboot)
  • DSP does not reach runtime-pm-suspended state between test cases

kv2019i avatar Jul 12 '23 09:07 kv2019i

DSP crashed in a previous test case and kernel is running with >CONFIG_SND_SOC_SOF_DEBUG_RETAIN_DSP_CONTEXT=y so DSP is not powered off anymore

It should not be the case, because this is also see in PASSED test case, just check any test case in our daily test.

DSP does not reach runtime-pm-suspended state between test cases

This is possible, the test interval between test case is 5 seconds, and in the kernel, the time for DSP to enter D3 is also 5s, let me try to make the test interval bigger.

aiChaoSONG avatar Jul 13 '23 02:07 aiChaoSONG

the time for DSP to enter D3 is also 5s, let me try to make the test interval bigger.

I think having the latter always longer than the former is a good idea in CI.

It should not be enforced on personal devices though :-)

marc-hb avatar Jul 13 '23 04:07 marc-hb

I have made the test case interval 10 seconds, which is far enough for DSP to enter D3, however, I still see the issue. check the internal report with ID:28982.

aiChaoSONG avatar Jul 13 '23 06:07 aiChaoSONG

I suspect there is one/many of following happening:

DSP crashed in a previous test case and kernel is running with CONFIG_SND_SOC_SOF_DEBUG_RETAIN_DSP_CONTEXT=y so DSP is not powered off anymore (until next system reboot)

This sounds like a https://github.com/thesofproject/linux issue

DSP does not reach runtime-pm-suspended state between test cases

This sounds like a https://github.com/thesofproject/sof issue

Either way this is not a sof-test issue. Transferring in 5..., 4..., 3...

marc-hb avatar Sep 28 '23 21:09 marc-hb

I have made the test case interval 10 seconds, which is far enough for DSP to enter D3, however, I still see the issue.

Maybe because of sof_pci_debug=0x1? (#1087)

marc-hb avatar Oct 04 '23 18:10 marc-hb