[LNL] SoundWire: pm_runtime resume timed out - interrupts lost
Spotted by chance on jf-lnlm-rvp-sdw-1 while testing something totally unrelated:
https://sof-ci.01.org/softestpr/PR1178/build324/devicetest/index.html?model=LNLM_SDW_AIOC&testcase=check-sof-logger (internal run ID 40150)
2024-04-22 22:40:16 UTC [ERROR] Caught kernel log error
===========================>>
[ 173.964280] kernel: rt715-sdca sdw:0:1:025d:0714:01: rt715_dev_resume: Enumeration not complete, timed out
[ 173.979644] kernel: rt715-sdca sdw:0:1:025d:0714:01: intel_resume_child_device: pm_runtime_resume failed: -110
[ 173.979746] kernel: soundwire sdw-master-0-1: sdw_intel_exit: intel_resume_child_device failed: -110
(Spotted while testing unrelated https://github.com/thesofproject/sof-test/pull/1178)
Apr 22 22:40:00 jf-lnlm-rvp-sdw-1 kernel: soundwire_cadence:cdns_fill_msg_resp: soundwire_intel soundwire_intel.link.2: Msg ignored for Slave 0
Apr 22 22:40:00 jf-lnlm-rvp-sdw-1 kernel: soundwire_bus:sdw_program_device_num: soundwire sdw-master-0-2: No more devices to enumerate
Apr 22 22:40:00 jf-lnlm-rvp-sdw-1 kernel: soundwire_cadence:cdns_update_slave_status_work: soundwire_intel soundwire_intel.link.2: Slave status change: 0x21
Apr 22 22:40:00 jf-lnlm-rvp-sdw-1 kernel: soundwire_bus:sdw_modify_slave_status: rt1316-sdca sdw:0:2:025d:1316:01: signaling enumeration completion for Slave 1
Apr 22 22:40:00 jf-lnlm-rvp-sdw-1 kernel: soundwire_bus:sdw_slave_set_frequency: rt1316-sdca sdw:0:2:025d:1316:01: Configured bus base 1, scale 3, mclk 19200000, curr_freq 4800000
Apr 22 22:40:00 jf-lnlm-rvp-sdw-1 kernel: snd_soc_rt1316_sdw:rt1316_io_init: rt1316-sdca sdw:0:3:025d:1316:01: rt1316_io_init hw_init complete
Apr 22 22:40:00 jf-lnlm-rvp-sdw-1 kernel: soundwire_bus:sdw_handle_slave_status: rt1316-sdca sdw:0:3:025d:1316:01: signaling initialization completion for Slave 1
Apr 22 22:40:00 jf-lnlm-rvp-sdw-1 kernel: snd_soc_rt1316_sdw:rt1316_io_init: rt1316-sdca sdw:0:2:025d:1316:01: rt1316_io_init hw_init complete
Apr 22 22:40:00 jf-lnlm-rvp-sdw-1 kernel: soundwire_bus:sdw_handle_slave_status: rt1316-sdca sdw:0:2:025d:1316:01: signaling initialization completion for Slave 1
Apr 22 22:40:00 jf-lnlm-rvp-sdw-1 kernel: snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-lnl 0000:00:1f.3: ipc tx : 0x44000000|0x3060004c: MOD_LARGE_CONFIG_SET [data size: 76]
Apr 22 22:40:00 jf-lnlm-rvp-sdw-1 kernel: snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-lnl 0000:00:1f.3: ipc tx reply: 0x64000000|0x3060004c: MOD_LARGE_CONFIG_SET
Apr 22 22:40:00 jf-lnlm-rvp-sdw-1 kernel: snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-lnl 0000:00:1f.3: ipc tx done : 0x44000000|0x3060004c: MOD_LARGE_CONFIG_SET [data size: 76]
Apr 22 22:40:00 jf-lnlm-rvp-sdw-1 kernel: snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-lnl 0000:00:1f.3: ipc tx : 0x47000000|0x0: MOD_SET_DX [data size: 8]
Apr 22 22:40:00 jf-lnlm-rvp-sdw-1 kernel: snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-lnl 0000:00:1f.3: ipc tx reply: 0x67000000|0x0: MOD_SET_DX
Apr 22 22:40:00 jf-lnlm-rvp-sdw-1 kernel: snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-lnl 0000:00:1f.3: ipc tx done : 0x47000000|0x0: MOD_SET_DX [data size: 8]
Apr 22 22:40:00 jf-lnlm-rvp-sdw-1 kernel: snd_soc_rt711_sdca:rt711_sdca_calibration: rt711-sdca sdw:0:0:025d:0711:01: rt711_sdca_calibration calibration complete, ret=0
Apr 22 22:40:00 jf-lnlm-rvp-sdw-1 kernel: snd_soc_rt711_sdca:rt711_sdca_jack_init: rt711-sdca sdw:0:0:025d:0711:01: in rt711_sdca_jack_init enable
Apr 22 22:40:00 jf-lnlm-rvp-sdw-1 kernel: snd_soc_rt711_sdca:rt711_sdca_io_init: rt711-sdca sdw:0:0:025d:0711:01: rt711_sdca_io_init hw_init complete
Apr 22 22:40:00 jf-lnlm-rvp-sdw-1 kernel: soundwire_bus:sdw_handle_slave_status: rt711-sdca sdw:0:0:025d:0711:01: signaling initialization completion for Slave 6
Apr 22 22:40:01 jf-lnlm-rvp-sdw-1 kernel: soundwire_cadence:cdns_update_slave_status_work: soundwire_intel soundwire_intel.link.0: Slave status change: 0x4000000
Apr 22 22:40:01 jf-lnlm-rvp-sdw-1 kernel: snd_soc_rt711_sdca:rt711_sdca_interrupt_callback: rt711-sdca sdw:0:0:025d:0711:01: rt711_sdca_interrupt_callback control_port_stat=4, sdca_cascade=1
Apr 22 22:40:01 jf-lnlm-rvp-sdw-1 kernel: soundwire_cadence:cdns_update_slave_status_work: soundwire_intel soundwire_intel.link.0: Slave status change: 0x2000000
Apr 22 22:40:04 jf-lnlm-rvp-sdw-1 kernel: soundwire_bus:sdw_bus_wait_for_clk_prep_deprep: soundwire sdw-master-0-3: clock stop prepare done slave:15
Apr 22 22:40:04 jf-lnlm-rvp-sdw-1 kernel: soundwire_bus:sdw_bus_wait_for_clk_prep_deprep: soundwire sdw-master-0-2: clock stop prepare done slave:15
Apr 22 22:40:04 jf-lnlm-rvp-sdw-1 kernel: soundwire_bus:sdw_bus_wait_for_clk_prep_deprep: soundwire sdw-master-0-0: clock stop prepare done slave:15
Apr 22 22:40:06 jf-lnlm-rvp-sdw-1 kernel: rt715-sdca sdw:0:1:025d:0714:01: rt715_dev_resume: Enumeration not complete, timed out
Apr 22 22:40:06 jf-lnlm-rvp-sdw-1 kernel: soundwire_bus:sdw_show_ping_status: soundwire sdw-master-0-1: PING status: 0x1
Apr 22 22:40:06 jf-lnlm-rvp-sdw-1 kernel: rt715-sdca sdw:0:1:025d:0714:01: intel_resume_child_device: pm_runtime_resume failed: -110
Apr 22 22:40:06 jf-lnlm-rvp-sdw-1 kernel: soundwire sdw-master-0-1: sdw_intel_exit: intel_resume_child_device failed: -110
Apr 22 22:40:06 jf-lnlm-rvp-sdw-1 kernel: soundwire_intel:intel_shim_vs_set_clock_source: soundwire_intel soundwire_intel.link.2: clock source 0 LVSCTL 0x0
Apr 22 22:40:06 jf-lnlm-rvp-sdw-1 kernel: soundwire_bus:sdw_modify_slave_status: rt1316-sdca sdw:0:2:025d:1316:01: initializing enumeration and init completion for Slave 1
Apr 22 22:40:06 jf-lnlm-rvp-sdw-1 kernel: soundwire_cadence:cdns_init_clock_ctrl: soundwire_intel soundwire_intel.link.2: mclk 19200000 max 4800000 row 50 col 4
Apr 22 22:40:06 jf-lnlm-rvp-sdw-1 kernel: soundwire_cadence:cdns_update_slave_status_work: soundwire_intel soundwire_intel.link.2: Slave status change: 0x2
Apr 22 22:40:06 jf-lnlm-rvp-sdw-1 kernel: soundwire_bus:sdw_handle_slave_status: soundwire sdw-master-0-2: Slave attached, programming device number
Apr 22 22:40:06 jf-lnlm-rvp-sdw-1 kernel: soundwire_bus:sdw_extract_slave_id: soundwire sdw-master-0-2: SDW Slave Addr: 31025d131601
Apr 22 22:40:06 jf-lnlm-rvp-sdw-1 kernel: soundwire_bus:sdw_extract_slave_id: soundwire sdw-master-0-2: SDW Slave class_id 0x01, mfg_id 0x025d, part_id 0x1316, unique_id 0x1, version 0x3
Apr 22 22:40:06 jf-lnlm-rvp-sdw-1 kernel: soundwire_bus:sdw_assign_device_num: soundwire sdw-master-0-2: Slave already registered, reusing dev_num:1
Apr 22 22:40:06 jf-lnlm-rvp-sdw-1 kernel: soundwire_cadence:cdns_fill_msg_resp: soundwire_intel soundwire_intel.link.2: Msg ignored for Slave 0
Apr 22 22:40:06 jf-lnlm-rvp-sdw-1 kernel: soundwire_bus:sdw_program_device_num: soundwire sdw-master-0-2: No more devices to enumerate
Apr 22 22:40:06 jf-lnlm-rvp-sdw-1 kernel: soundwire_cadence:cdns_update_slave_status_work: soundwire_intel soundwire_intel.link.2: Slave status change: 0x21
Apr 22 22:40:06 jf-lnlm-rvp-sdw-1 kernel: soundwire_bus:sdw_modify_slave_status: rt1316-sdca sdw:0:2:025d:1316:01: signaling enumeration completion for Slave 1
Apr 22 22:40:06 jf-lnlm-rvp-sdw-1 kernel: soundwire_bus:sdw_slave_set_frequency: rt1316-sdca sdw:0:2:025d:1316:01: Configured bus base 1, scale 3, mclk 19200000, curr_freq 4800000
Apr 22 22:40:06 jf-lnlm-rvp-sdw-1 kernel: snd_soc_rt1316_sdw:rt1316_io_init: rt1316-sdca sdw:0:2:025d:1316:01: rt1316_io_init hw_init complete
Apr 22 22:40:06 jf-lnlm-rvp-sdw-1 kernel: soundwire_bus:sdw_handle_slave_status: rt1316-sdca sdw:0:2:025d:1316:01: signaling initialization completion for Slave 1
Apr 22 22:40:06 jf-lnlm-rvp-sdw-1 kernel: soundwire_intel:intel_shim_vs_set_clock_source: soundwire_intel soundwire_intel.link.3: clock source 0 LVSCTL 0x0
Apr 22 22:40:06 jf-lnlm-rvp-sdw-1 kernel: soundwire_bus:sdw_modify_slave_status: rt1316-sdca sdw:0:3:025d:1316:01: initializing enumeration and init completion for Slave 1
Apr 22 22:40:06 jf-lnlm-rvp-sdw-1 kernel: soundwire_cadence:cdns_init_clock_ctrl: soundwire_intel soundwire_intel.link.3: mclk 19200000 max 4800000 row 50 col 4
Apr 22 22:40:06 jf-lnlm-rvp-sdw-1 kernel: soundwire_cadence:cdns_update_slave_status_work: soundwire_intel soundwire_intel.link.3: Slave status change: 0x2
Apr 22 22:40:06 jf-lnlm-rvp-sdw-1 kernel: soundwire_bus:sdw_handle_slave_status: soundwire sdw-master-0-3: Slave attached, programming device number
Apr 22 22:40:06 jf-lnlm-rvp-sdw-1 kernel: soundwire_bus:sdw_extract_slave_id: soundwire sdw-master-0-3: SDW Slave Addr: 30025d131601
A
Looks like a hardware/board issue to me
The last log for the rt714 is this https://sof-ci.ostc.intel.com/#/result/planresultdetail/40150?model=LNLM_SDW_AIOC&testcase=verify-kernel-boot-log
[ 5.655780] kernel: soundwire_cadence:cdns_update_slave_status_work: soundwire_intel soundwire_intel.link.1: Slave status change: 0x20000001
This means the device was assigned an address and was no longer seen as device 0. But then nothing happened after that.
@bardliao what do you think?
@plbossart Device number has been assigned. And there is no trace in rt715_sdca_io_init(). I think it is probably why we don't see the next action in the log.
[ 5.654922] kernel: soundwire_cadence:cdns_update_slave_status_work: soundwire_intel soundwire_intel.link.1: Slave status change: 0x2
[ 5.654946] kernel: soundwire_bus:sdw_handle_slave_status: soundwire sdw-master-0-1: Slave attached, programming device number
...
[ 5.655694] kernel: soundwire_bus:sdw_modify_slave_status: rt711-sdca sdw:0:0:025d:0711:01: signaling enumeration completion for Slave 6
To me, it is more like a codec driver error. See change below. We need to wait for initialization_complete instead of enumeration_complete when the codec get resume.
diff --git a/sound/soc/codecs/rt715-sdca-sdw.c b/sound/soc/codecs/rt715-sdca-sdw.c
index d3fb02e8f31a..c8dabb9b16b5 100644
--- a/sound/soc/codecs/rt715-sdca-sdw.c
+++ b/sound/soc/codecs/rt715-sdca-sdw.c
@@ -234,10 +234,10 @@ static int __maybe_unused rt715_dev_resume(struct device *dev)
if (!slave->unattach_request)
goto regmap_sync;
- time = wait_for_completion_timeout(&slave->enumeration_complete,
+ time = wait_for_completion_timeout(&slave->initialization_complete,
msecs_to_jiffies(RT715_PROBE_TIMEOUT));
if (!time) {
- dev_err(&slave->dev, "%s: Enumeration not complete, timed out\n", __func__);
+ dev_err(&slave->dev, "%s: Initialization not complete, timed out\n", __func__);
sdw_show_ping_status(slave->bus, true);
return -ETIMEDOUT;
I saw that confusion @bardliao but if the enumeration didn't complete by the deadline, then the initialization would not complete either?
Seen again today on a different device, this time it's ba-lnlm-rvp-sdw-01
https://sof-ci.01.org/softestpr/PR1075/build329/devicetest/index.html (internal run 40213)
I saw that confusion @bardliao but if the enumeration didn't complete by the deadline, then the initialization would not complete either?
Fair point. Indeed, I can't find soundwire sdw-master-0-1: Slave attached, programming device number from https://github.com/thesofproject/linux/issues/4943#issuecomment-2071101627. And it happened again with different DUTs/codecs. https://github.com/thesofproject/linux/issues/4943#issuecomment-2073642246. Can it be AC timing issue again?
@bardliao possibly an AC timing problem indeed. With PR https://github.com/thesofproject/linux/pull/4915, we modified the default for DOAIS from 0x1 to 0x3 (hardware default). It's worth checking if reverting this change 'fixes' the issue.
reverting the DOAIS setting does not trigger any obvious change, so we'll have to track where and when this issue re-occurs, if at all.
similar problem reported on LNL RVP:
https://sof-ci.01.org/linuxpr/PR4857/build2721/devicetest/index.html?model=LNLM_SDW_AIOC&testcase=check-kmod-load-unload
[ 1964.576041] kernel: rt711-sdca sdw:0:0:025d:0711:01: rt711_sdca_dev_resume: Initialization not complete, timed out
[ 1964.591372] kernel: soundwire_bus:sdw_show_ping_status: soundwire sdw-master-0-0: PING status: 0x1
[ 1964.591437] kernel: rt711-sdca sdw:0:0:025d:0711:01: intel_resume_child_device: pm_runtime_resume failed: -110
[ 1964.591584] kernel: soundwire sdw-master-0-0: sdw_intel_exit: intel_resume_child_device failed: -110
It looks like the RT711 is attached on link0 but somehow we never deal with it. That's starting to look like an interrupt that's not detected.
It was the same pattern in the initial bug report above but this time for link1.
Apr 22 22:40:06 jf-lnlm-rvp-sdw-1 kernel: rt715-sdca sdw:0:1:025d:0714:01: rt715_dev_resume: Enumeration not complete, timed out
Apr 22 22:40:06 jf-lnlm-rvp-sdw-1 kernel: soundwire_bus:sdw_show_ping_status: soundwire sdw-master-0-1: PING status: 0x1
Apr 22 22:40:06 jf-lnlm-rvp-sdw-1 kernel: rt715-sdca sdw:0:1:025d:0714:01: intel_resume_child_device: pm_runtime_resume failed: -110
Apr 22 22:40:06 jf-lnlm-rvp-sdw-1 kernel: soundwire sdw-master-0-1: sdw_intel_exit: intel_resume_child_device failed: -110
@bardliao FYI. My two hypotheses are a) this comes from a bad setup with delayed interrupt enable b) there's an edge interrupt from two separate links at the same time and we miss one of the two.
The mechanism for interrupts is that we have ONE interrupt, and then we deal with all the links with this loop
list_for_each_entry(link, &ctx->link_list, list)
sdw_cdns_irq(irq, link->cdns);
If we have an interrupt on link2, we could very well miss an interrupt on link0 if it happens AFTER we deal with link0 in the list walk. If we look at the reports so far we seem to miss the interrupt for link0 and link1, in all cases we don't seem to have problems dealing with the amplifiers on link2-3.
The mechanism for interrupts is that we have ONE interrupt, and then we deal with all the links with this loop
list_for_each_entry(link, &ctx->link_list, list) sdw_cdns_irq(irq, link->cdns);If we have an interrupt on link2, we could very well miss an interrupt on link0 if it happens AFTER we deal with link0 in the list walk. If we look at the reports so far we seem to miss the interrupt for link0 and link1, in all cases we don't seem to have problems dealing with the amplifiers on link2-3.
Good finding. And it explains why the issue only seen with AIOC. However, I am curious why we only see the issue on LNL?
@bardliao I am not sure this theory is correct, if it is that would mean the solution we implemented years ago was never robust... For LNL we start the links earlier, and IIRC we don't power-up all the links at once as we do for earlier generations. That could explain differences in timing during the enumeration stage.
@bardliao I am not sure this theory is correct, if it is that would mean the solution we implemented years ago was never robust... For LNL we start the links earlier, and IIRC we don't power-up all the links at once as we do for earlier generations. That could explain differences in timing during the enumeration stage.
Hope we can find something with https://github.com/thesofproject/linux/pull/4976. But if it is the case, how should we fix it?
If I am correct, we can loop multiple times to reduce the possibility of such issues, but it would not be a 100% robust fix. If we still miss an interrupt, we could add a recovery procedure when the resume timeout happens, so that we force the sdw_interrupt thread to be revisited again. That would be really ugly.
Question 1 is probably, is the interrupt definitely edge triggered? And secondly, does it have to be? I would assume it is here and there is no option to switch it to level triggered, which is generally much easier to get right. The normal handling for overlapping edge triggered IRQs works roughly like this:
- IRQ comes in, the IRQ is left enabled, the state is set to handling, and the handler is started.
- If another IRQ comes in, we see the state is already handling so the IRQ is masked and the state is set to pending.
- When the first IRQ handler completes, we see the state is pending so the state is set back to handling, the IRQ is unmasked and the handler is re-entered.
Assuming one has the IRQ stuff all setup correctly this should be handled by the IRQ core.
@charleskeepax we have ONE interrupt, but it's a combined status for IPC, streams, SoundWire and wakes. The solution we used so far was the same as Windows and it's been quite reliable since CML. What is possible is that with the change to use the HDaudio multi-link stuff, we handle things differently and the combination of events across links is possibly questionable. It's not really that the interrupt generation changed.
I have had very occasional enumeration failures with cs42l43 on the TGL UpX, that has been on my list to investigate for a quite a while. This is probably a good candidate for a cause. If I follow the code correctly in sound/soc/sof/intel/hda.c, it is masking the IRQ in the handler, and unmasking it at the bottom of the thread. That is a pretty questionable thing to be doing with an edge triggered IRQ, will depend on the exact details of the hardware but generally that would be broken as you lose any IRQs that happen during the time the handler is running. Also keep in mind this could be a result of there being more IRQs. The less IRQs there are the less likely you are to lose some, so maybe one of those other changes increased the IRQ volume, or just changed the time such that two IRQs were more likely to happen close to each other.
But first step would really be to check the hardware behaviour, we still haven't had any luck getting docs for the Cadence IP, although I assume this bit falls outside of that anyway.
It's not a Cadence problem.
we did change the timing on LNL, now the links are started serially whereas in previous solutions we have to power them up at the same time. I think the serial part creates a case where a link has an alert while handling another, it would be less likely in the simultaneous case since all devices would be handled at the same time.
- 1.1 IRQ comes in, the IRQ is left enabled, 1.2 the state is set to handling, ...
- If another IRQ comes in, we see the state is already handling so the IRQ is masked and the state is set to pending.
What happens if 2) comes between 1.1 and 1.2?
- ... so 3.2 the state is set back to handling, 3.3 the IRQ is unmasked and...
Looks like there is another short window between 3.2 and 3.3 for another race here...
The normal handling for overlapping edge triggered IRQs works roughly like this ... Assuming one has the IRQ stuff all setup correctly this should be handled by the IRQ core.
The generic issue should indeed be a "solved problem": do you have a pointer to a good and not "rough" reference documentation? Unless edge-triggered interrupts are broken by design :-D
- 1.1 IRQ comes in, the IRQ is left enabled, 1.2 the state is set to handling, ...
- If another IRQ comes in, we see the state is already handling so the IRQ is masked and the state is set to pending.
What happens if 2) comes between 1.1 and 1.2?
I believe the state variable would need to be either handled as an atomic or be guarded with a spin lock. I think the IRQ code uses a spin lock if memory serves but will try to find time to have a look tomorrow.
- ... so 3.2 the state is set back to handling, 3.3 the IRQ is unmasked and...
Looks like there is another short window between 3.2 and 3.3 for another race here...
Not sure this one actually counts since until the IRQ is unmasked you know no new handlers will kick off but depends a bit on how you order the code. Probably safest to wrap all access to the state in a spin lock.
The normal handling for overlapping edge triggered IRQs works roughly like this ... Assuming one has the IRQ stuff all setup correctly this should be handled by the IRQ core.
The generic issue should indeed be a "solved problem": do you have a pointer to a good and not "rough" reference documentation? Unless edge-triggered interrupts are broken by design :-D
They are not broken by design, but they sure are much harder to get right than level triggered IRQs. Partly why it's a good idea to outsource the handling to the IRQ subsystem. I think I had a couple emails from Thomas Gleixner saved somewhere I will try to dig out.
So yeah checking the code (handle_edge_irq in kernel/irq/chip.c), it does indeed protect things with a spinlock.
The IRQ docs contains a description of the edge IRQ flow:
https://www.kernel.org/doc/html/v4.12/core-api/genericirq.html#default-edge-irq-flow-handler
Here is a thread on the mailing list where someone was attempting to do something similar with an edge IRQ masking it in the handler, unmasking it in the thread and Thomas explaining the issues with that:
https://lore.kernel.org/all/alpine.LFD.2.02.1210121440160.2779@ionos/#R
@marc-hb EDIT: in this newer page, the formatting is not broken: https://www.kernel.org/doc/html/v6.8/core-api/genericirq.html#default-edge-irq-flow-handler (thanks!)
We have confirmation from our hardware folks that all IRQ handling is level-based. So we have to look for other explanations for the behavior we're seeing.
Nice, that does generally make life much easier and the current code masking in the handler and unmasking at the end of the thread should be totally safe with a level IRQ. Since you should just fall right back into the handler again. It does make it much less clear what is going on though :-)
Looks similar but with rt711_sdca_dev_resume() instead:
https://sof-ci.01.org/sofpr/PR9115/build4618/devicetest/index.html?model=LNLM_SDW_AIOC&testcase=check-kmod-load-unload-after-playback
[ 2015.947252] kernel: soundwire_bus:sdw_handle_slave_status: rt1316-sdca sdw:0:3:025d:1316:01: signaling initialization completion for Slave 1
[ 2015.975402] kernel: snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-lnl 0000:00:1f.3: ipc tx : 0x44000000|0x3060004c: MOD_LARGE_CONFIG_SET [data size: 76]
[ 2015.975955] kernel: snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-lnl 0000:00:1f.3: ipc tx reply: 0x64000000|0x3060004c: MOD_LARGE_CONFIG_SET
[ 2015.975977] kernel: snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-lnl 0000:00:1f.3: ipc tx done : 0x44000000|0x3060004c: MOD_LARGE_CONFIG_SET [data size: 76]
[ 2015.975981] kernel: snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-lnl 0000:00:1f.3: ipc tx : 0x47000000|0x0: MOD_SET_DX [data size: 8]
[ 2016.136402] kernel: snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-lnl 0000:00:1f.3: ipc tx reply: 0x67000000|0x0: MOD_SET_DX
[ 2016.136700] kernel: snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-lnl 0000:00:1f.3: ipc tx done : 0x47000000|0x0: MOD_SET_DX [data size: 8]
[ 2019.699760] kernel: soundwire_bus:sdw_bus_wait_for_clk_prep_deprep: soundwire sdw-master-0-3: clock stop prepare done slave:15
[ 2019.699820] kernel: soundwire_bus:sdw_bus_wait_for_clk_prep_deprep: soundwire sdw-master-0-2: clock stop prepare done slave:15
[ 2019.700110] kernel: soundwire_bus:sdw_bus_wait_for_clk_prep_deprep: soundwire sdw-master-0-1: clock stop prepare done slave:15
[ 2021.409075] kernel: rt711-sdca sdw:0:0:025d:0711:01: rt711_sdca_dev_resume: Initialization not complete, timed out
[ 2021.424492] kernel: soundwire_bus:sdw_show_ping_status: soundwire sdw-master-0-0: PING status: 0x1
[ 2021.424536] kernel: rt711-sdca sdw:0:0:025d:0711:01: intel_resume_child_device: pm_runtime_resume failed: -110
[ 2021.424658] kernel: soundwire sdw-master-0-0: sdw_intel_exit: intel_resume_child_device failed: -110
[ 2021.424994] kernel: soundwire_intel:intel_shim_vs_set_clock_source: soundwire_intel soundwire_intel.link.1: clock source 0 LVSCTL 0x0
[ 2021.425067] kernel: soundwire_bus:sdw_modify_slave_status: rt715-sdca sdw:0:1:025d:0714:01: initializing enumeration and init completion for Slave 7
[ 2021.425088] kernel: soundwire_cadence:cdns_init_clock_ctrl: soundwire_intel soundwire_intel.link.1: mclk 19200000 max 4800000 row 50 col 4
[ 2021.426733] kernel: soundwire_cadence:cdns_update_slave_status_work: soundwire_intel soundwire_intel.link.1: Slave status change: 0x2
[ 2021.426899] kernel: soundwire_bus:sdw_handle_slave_status: soundwire sdw-master-0-1: Slave attached, programming device number
[
EDIT: another one https://sof-ci.01.org/sofpr/PR9159/build4831/devicetest/index.html?model=LNLM_SDW_AIOC&testcase=multiple-pipeline-all
https://sof-ci.01.org/sofpr/PR9159/build4831/devicetest/index.html?model=LNLM_SDW_AIOC&testcase=check-capture-all-formats
https://sof-ci.01.org/sofpr/PR9159/build4831/devicetest/index.html?model=LNLM_SDW_AIOC&testcase=check-kmod-load-unload
Another one reported in https://github.com/thesofproject/sof/issues/9122 https://sof-ci.01.org/sofpr/PR9115/build4618/devicetest/index.html?model=LNLM_SDW_AIOC&testcase=check-kmod-load-unload-after-playback
I haven't seen this error reported in the last two weeks. @marc-hb @fredoh9 do we have a mechanism to look for strings in test logs. A match on "PING status" would be enough
We can scan test results into internal storage server directly. Need to narrow scan scope to avoid taking long time. I can try this this afternoon and let you know.