[LNL]Suspend resume with playback or capture will Fail or TIMEOUT
Suspend resume with playback or capture will Fail or TIMEOUT
Steps to reproduce
- Use NOCODEC/HDA/SDW issue sporadically observed on all devices. below cmd are for nocodec
- TPLG=/lib/firmware/intel/development/sof-lnl-nocodec.tplg MODEL=LNLM_RVP_NOCODEC SOF_TEST_INTERVAL=5 ~/sof-test/test-case/check-suspend-resume-with-audio.sh -l 5 -m playback or TPLG=/lib/firmware/intel/development/sof-lnl-nocodec.tplg MODEL=LNLM_RVP_NOCODEC SOF_TEST_INTERVAL=5 ~/sof-test/test-case/check-suspend-resume-with-audio.sh -l 5 -m capture
The test will either fail or hang
Logs console_logs.txt dmesg_logs.txt
Configs Linux Commit: d25788054d59 KConfig Branch: master KConfig Commit: c3171afedc63 SOF Branch: stable-v2.10 SOF Commit: b15f1f1a3238
cc:
- #5109
Failure logs are a bit unclear w.r.t. where things start to go south. The last messages to FW seem sane, cores are powered down, finally SET_DX is sent without errors. But then there's a generic kernel pm -EBUSY error for late_suspend (one example, Intel test run 43010):
[ 408.720709] kernel: snd_sof_intel_hda_common:hda_dsp_state_log: sof-audio-pci-intel-lnl 0000:00:1f.3: Current DSP power state: D3
[ 408.720720] kernel: snd_sof:sof_set_fw_state: sof-audio-pci-intel-lnl 0000:00:1f.3: fw_state change: 7 -> 0
[ 412.467352] kernel: PM: late suspend of devices failed
Running more tests to get insights on the failures.
Digging further into scenario #5085 addresses, it seems following combination of events happens:
- deep buffer PCM enters D0i3x state
- xrun happens just at suspend, so arecord is doing xrun handling (PCM is stopped, prepared and started again after resume)
- as the PCM is in stopped (due to the xrun), SNDRV_PCM_TRIGGER_SUSPEND is not sent to DAI, and the NULL deref can be hit
Example log
[ 882.643397] DEBUG: cmd=1 dai SSP0 Pin direction 0
# KV: deep buffer PCM31 started
[ 882.648298] snd_sof:sof_pcm_trigger: sof-audio-pci-intel-lnl 0000:00:1f.3: pcm: trigger stream 31 dir 0 cmd 1
[ 882.648307] snd_sof:sof_ipc4_trigger_pipelines: sof-audio-pci-intel-lnl 0000:00:1f.3: trigger cmd: 1 state: 4
[ 882.648315] snd_sof:sof_ipc4_set_pipeline_state: sof-audio-pci-intel-lnl 0000:00:1f.3: ipc4 set pipeline instance 0 state 3
[ 882.648322] snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-lnl 0000:00:1f.3: ipc tx : 0x13000003|0x0: GLB_SET_PIPELINE_STATE
[ 882.648759] snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-lnl 0000:00:1f.3: ipc tx reply: 0x33000000|0x0: GLB_SET_PIPELINE_STATE
[ 882.648788] snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-lnl 0000:00:1f.3: ipc tx done : 0x13000003|0x0: GLB_SET_PIPELINE_STATE
[ 882.648795] snd_sof:sof_ipc4_set_pipeline_state: sof-audio-pci-intel-lnl 0000:00:1f.3: ipc4 set pipeline instance 0 state 4
[ 882.648800] snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-lnl 0000:00:1f.3: ipc tx : 0x13000004|0x0: GLB_SET_PIPELINE_STATE
[ 882.649963] snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-lnl 0000:00:1f.3: ipc tx reply: 0x33000000|0x0: GLB_SET_PIPELINE_STATE
[ 882.649985] snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-lnl 0000:00:1f.3: ipc tx done : 0x13000004|0x0: GLB_SET_PIPELINE_STATE
[ 882.649997] snd_sof_intel_hda_common:hda_dsp_stream_trigger: sof-audio-pci-intel-lnl 0000:00:1f.3: FW Poll Status: reg[0x1e0]=0x2014001e successful
[ 884.689120] asix 3-3:1.0 enx34298f712491: Link is Up - 100Mbps/Full - flow control rx/tx
# KV: opportunistic D0iX is attempt at 887sec
[ 887.690497] snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-lnl 0000:00:1f.3: ipc tx : 0x48000000|0x12: MOD_SET_D0IX
[ 887.691066] snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-lnl 0000:00:1f.3: ipc tx reply: 0x68000000|0x12: MOD_SET_D0IX
[ 887.691109] snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-lnl 0000:00:1f.3: ipc tx done : 0x48000000|0x12: MOD_SET_D0IX
[ 887.691117] snd_sof_intel_hda_common:hda_dsp_state_log: sof-audio-pci-intel-lnl 0000:00:1f.3: Current DSP power state: D0I3
# KV: suspend process is started soon after
[ 887.983323] PM: suspend entry (s2idle)
[ 887.985028] Filesystems sync: 0.001 seconds
# KV: user-space see xruns and stops the PCM as part of xrun handlng, user-space not yet frozen
[ 888.121816] snd_sof:sof_pcm_trigger: sof-audio-pci-intel-lnl 0000:00:1f.3: pcm: trigger stream 31 dir 0 cmd 0
[ 888.121825] snd_sof:sof_ipc4_trigger_pipelines: sof-audio-pci-intel-lnl 0000:00:1f.3: trigger cmd: 0 state: 3
[ 888.121830] snd_sof:sof_ipc4_set_pipeline_state: sof-audio-pci-intel-lnl 0000:00:1f.3: ipc4 set pipeline instance 0 state 3
[ 888.121915] snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-lnl 0000:00:1f.3: ipc tx : 0x48000000|0x4: MOD_SET_D0IX
[ 888.122275] snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-lnl 0000:00:1f.3: ipc tx reply: 0x68000000|0x4: MOD_SET_D0IX
[ 888.122288] snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-lnl 0000:00:1f.3: ipc tx done : 0x48000000|0x4: MOD_SET_D0IX
[ 888.122290] snd_sof_intel_hda_common:hda_dsp_state_log: sof-audio-pci-intel-lnl 0000:00:1f.3: Current DSP power state: D0I0
[...]
[ 889.222280] snd_sof:sof_widget_free_unlocked: sof-audio-pci-intel-lnl 0000:00:1f.3: widget pipeline.2 freed
[ 889.222285] snd_sof:sof_widget_free_unlocked: sof-audio-pci-intel-lnl 0000:00:1f.3: widget dai-copier.SSP.NoCodec-0.playback freed
# KV: much later hda_dai_suspend() is called with hext_stream non-NULL and this will kill the kernel
[ 889.262695] DEBUG: dai suspend 13a0f000
PR #5085 will avoid the kernel crash, but the xrun handling concurrently with system suspend is not successful and the streaming later terminates to:
[ 893.639609] snd_sof_intel_hda_common:hda_dai_trigger: sof-audio-pci-intel-lnl 0000:00:1f.3: cmd=1 dai SSP0 Pin direction 0
[ 893.639621] sof-audio-pci-intel-lnl 0000:00:1f.3: ASoC: error at soc_dai_trigger on SSP0 Pin: -22
[ 893.648704] Deepbuffer Port0: ASoC: trigger FE cmd: 1 failed: -22
wow, that's quite a corner case: an xrun while suspend, no one saw that coming.
the part that I am not following is that the xrun seems to be reproducible, which suggests some sort of D0i3 problem in the first place.
Also if the PCM device is stopped before a system suspend, it's not clear to me what userspace will do on resume? Would it go on with a prepare/trigger? or would there be an additional stage of hw_free/hw_params?
@plbossart I'll try to see if this is completely reproducible, but indeed the xrun seems to start the chain of failures. The user-space will continue xrun handling as soon as resume is complete:
# PCM31 is prepared just before user-space is suspended
[ 889.160611] snd_sof:sof_widget_free_unlocked: sof-audio-pci-intel-lnl 0000:00:1f.3: widget pipeline.2 freed
[ 889.160621] snd_sof:sof_widget_free_unlocked: sof-audio-pci-intel-lnl 0000:00:1f.3: widget dai-copier.SSP.NoCodec-0.playback freed
[ 889.160638] snd_sof:sof_pcm_prepare: sof-audio-pci-intel-lnl 0000:00:1f.3: pcm: prepare stream 31 dir 0
[ 889.160641] snd_sof:sof_pcm_hw_params: sof-audio-pci-intel-lnl 0000:00:1f.3: pcm: hw params stream 31 dir 0
# then we enter suspend
[ 889.311599] ACPI: EC: interrupt blocked
[ 893.102613] ACPI: EC: interrupt unblocked
# resume starts, DSP is booted
[ 893.207454] snd_sof:snd_sof_run_firmware: sof-audio-pci-intel-lnl 0000:00:1f.3: firmware boot complete
[ 893.207465] snd_sof:sof_set_fw_state: sof-audio-pci-intel-lnl 0000:00:1f.3: fw_state change: 6 -> 7
# system resume completes in kernel and resumes user-space
[ 893.616716] OOM killer enabled.
[ 893.616721] Restarting tasks ...
# upon resume prepare is called AGAIN on the PCM31 (this is normal after system suspend)
[ 893.618603] snd_sof:sof_pcm_prepare: sof-audio-pci-intel-lnl 0000:00:1f.3: pcm: prepare stream 31 dir 0
[ 893.618613] snd_sof:sof_pcm_hw_params: sof-audio-pci-intel-lnl 0000:00:1f.3: pcm: hw params stream 31 dir 0
# to my eyes, the DSP pipeline setup goes just fine, no errors from the IPCs
# then FE PCM START trigger fals and I see no errors from SOF
[ 893.639609] snd_sof_intel_hda_common:hda_dai_trigger: sof-audio-pci-intel-lnl 0000:00:1f.3: cmd=1 dai SSP0 Pin direction 0
[ 893.639621] sof-audio-pci-intel-lnl 0000:00:1f.3: ASoC: error at soc_dai_trigger on SSP0 Pin: -22
[ 893.648704] Deepbuffer Port0: ASoC: trigger FE cmd: 1 failed: -22
# then we just see user-space tearing down the PCM
In user-space console, this shows as:
rtcwake: wakeup from "mem" using /dev/rtc0 at Wed Jun 26 19:20:20 2024
underrun!!! (at least 1028.938 ms long)
2024-06-26 19:20:20 UTC Sub-Test: [COMMAND] sleep for 5
aplay: pcm_write:2127: write error: Input/output error
So opens are at least why the xrun occurs in the first place (deep buffer, D0i3 mode entry, ...), and second is the why the prepare is called twice and is the flow correct here (I have a hunch the double-prepare confuses the driver state somewhere and thus we get the -22 from FE trigger on the resume).
Some refinements:
- hit the condition (=suspend with substream present) with D0ix disabled (streams marked not compatible with D0ix)
- hit the condition once without a preceding xrun
For the latter, I can't really explain why this happens yet. There's a trigger to stop the PCM right after suspend starts but nothing printed out by aplay so this is coming from ALSA core:
[47625.798898] PM: suspend entry (s2idle)
[47625.800453] Filesystems sync: 0.001 seconds
[47626.294808] snd_sof:sof_pcm_trigger: sof-audio-pci-intel-lnl 0000:00:1f.3: pcm: trigger stream 0 dir 0 cmd 0
It seems the failure sequence is related to a system hickup in system process. Whenever SOF see unexpected sequence (no SUSPEND action for the DAI), there is unexpected 2sec gap in the system suspend process. This often generates an xrun, but exact sequence depends on the length of the gap:
# success case, completes within 10ms
[69489.712206] Filesystems sync: 0.001 seconds
[69489.717798] Freezing user space processes
[69489.719699] Freezing user space processes completed (elapsed 0.001 seconds)
# success case:
[69500.740018] Filesystems sync: 0.001 seconds
[69500.745193] Freezing user space processes
[69500.747031] Freezing user space processes completed (elapsed 0.001 seconds)
# success case:
[69511.757591] Filesystems sync: 0.001 seconds
[69511.761816] Freezing user space processes
[69511.763580] Freezing user space processes completed (elapsed 0.001 seconds)
# FAILURE case (without #5085 , kernel would crash here)
[69522.814773] Filesystems sync: 0.001 seconds
# note the 2second gap! user-space will stop the PCM after xrun and this is done before user-space is frozen
[69524.304214] Freezing user space processes
[69524.305612] Freezing user space processes completed (elapsed 0.001 seconds)
With the kernel fix in place, we now see similar occurences in CI with suspend taking longer than expected and following error in log:
kernel:
[ 334.703012] kernel: snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-lnl 0000:00:1f.3: ipc tx : 0x13000003|0x0: GLB_SET_PIPELINE_STATE
[ 335.206800] kernel: sof-audio-pci-intel-lnl 0000:00:1f.3: ipc timed out for 0x13000003|0x0
[... kernel error messages]
[ 335.207407] kernel: snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-lnl 0000:00:1f.3: ipc tx : 0x46000002|0x3: MOD_UNBIND
FW:
[ 334.702861] <inf> ipc: ipc_cmd: rx : 0x13000003|0x0
[ 334.702875] <inf> pipe: pipeline_trigger: pipe:0 0x0 pipe trigger cmd 2
[ 334.702953] <inf> ll_schedule: zephyr_ll_task_done: task complete 0xa0119e00 0x20210U
[ 334.702956] <inf> ll_schedule: zephyr_ll_task_done: num_tasks 2 total_num_tasks 2
[ 335.207238] <inf> ipc: ipc_cmd: rx : 0x46000002|0x3
So the SET_PIPELNE response is not seen by host. This only happens when the system suspend process has the unusual delay, so something unusual going on on the host system side. I'll keep the bug on driver side until we get more insight on what happens with the suspend.
So the SET_PIPELNE response is not seen by host.
Is it possible that FW dead before it sent the response? Like, it dead when it process the MOD_UNBIND command.
@ssavati are you able to rerun test now this PR above has merged, if all good pls close. Thanks !
@ssavati are you able to rerun test now this PR above has merged, if all good pls close. Thanks !
Sure will run some functional and stress testcase related to Suspend resume
@lgirdwood still observed failures. for more visit CI #/result/planresultdetail/43738
Ack @ssavati @lgirdwood I think this is expected. The kernel PR fixes the kernel crash, but there are still failure. The whole system doesn't hit panic anymore, but we hit a failure nevertheless (see comment https://github.com/thesofproject/linux/issues/5080#issuecomment-2194909579 ).
@kv2019i @ssavati what's the repro rate of the suspend delay failure ? The 2 second delay looks like we are blocked on pending IO completing. PCM core will copy to DMA buffers, but this may atomically block on audio HW ptr being updated too ?
Passing to you for check @ranj063 . One case seen in daily run (43842), not seen in the previous daily.
@kv2019i I see this in the latest failure and another one a few days ago. after the previous successful D3 entry
[ 7868.464052] kernel: snd_sof:snd_sof_pci_update_bits_unlocked: sof-audio-pci-intel-lnl 0000:00:1f.3: Debug PCIW: 00000010 at 00000044
[ 7868.465195] kernel: snd_sof_intel_hda_common:hda_dsp_state_log: sof-audio-pci-intel-lnl 0000:00:1f.3: Current DSP power state: D3
[ 7868.465202] kernel: snd_sof:sof_set_fw_state: sof-audio-pci-intel-lnl 0000:00:1f.3: fw_state change: 7 -> 0
[ 7872.220211] kernel: PM: late suspend of devices failed
I wonder whats causing the late suspend to fail
Spotted this test failure in today's run, not sure whether it's related
43923?model=LNLM_SDW_AIOC&testcase=check-suspend-resume-with-capture-5
[ 4823.610270] kernel: snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-lnl 0000:00:1f.3: ipc tx reply: 0x33000000|0x0: GLB_SET_PIPELINE_STATE
[ 4823.610535] kernel: snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-lnl 0000:00:1f.3: ipc tx done : 0x13010004|0x0: GLB_SET_PIPELINE_STATE
[ 4823.611330] kernel: snd_sof_intel_hda_common:hda_dsp_stream_trigger: sof-audio-pci-intel-lnl 0000:00:1f.3: FW Poll Status: reg[0x80]=0x14001e successful
[ 4823.614095] kernel: snd_soc_rt1316_sdw:rt1316_io_init: rt1316-sdca sdw:0:2:025d:1316:01: rt1316_io_init hw_init complete
[ 4823.614185] kernel: soundwire_bus:sdw_handle_slave_status: rt1316-sdca sdw:0:2:025d:1316:01: signaling initialization completion for Slave 1
[ 4823.614899] kernel: snd_soc_rt1316_sdw:rt1316_io_init: rt1316-sdca sdw:0:3:025d:1316:01: rt1316_io_init hw_init complete
[ 4823.614913] kernel: soundwire_bus:sdw_handle_slave_status: rt1316-sdca sdw:0:3:025d:1316:01: signaling initialization completion for Slave 1
[ 4823.615022] kernel: snd_sof_intel_hda_common:hda_dai_trigger: soundwire_intel soundwire_intel.link.0: cmd=1 dai SDW0 Pin3 direction 1
[ 4823.615050] kernel: soundwire_intel soundwire_intel.link.0: ASoC: error at soc_dai_trigger on SDW0 Pin3: -22
[ 4823.617591] kernel: Jack In: ASoC: trigger FE cmd: 1 failed: -22
[ 4823.617767] kernel: snd_sof:sof_pcm_trigger: sof-audio-pci-intel-lnl 0000:00:1f.3: pcm: trigger stream 1 dir 1 cmd 0
[ 4823.617787] kernel: snd_sof:sof_ipc4_trigger_pipelines: sof-audio-pci-intel-lnl 0000:00:1f.3: trigger cmd: 0 state: 3
[ 4823.617804] kernel: snd_sof:sof_ipc4_set_pipeline_state: sof-audio-pci-intel-lnl 0000:00:1f.3: ipc4 set pipeline instance 1 state 3
[ 4823.617821] kernel: snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-lnl 0000:00:1f.3: ipc tx : 0x13010003|0x0: GLB_SET_PIPELINE_STATE
[ 4823.619171] kernel: snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-lnl 0000:00:1f.3: ipc tx reply: 0x33000000|0x0: GLB_SET_PIPELINE_STATE
[ 4823.619189] kernel: snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-lnl 0000:00:1f.3: ipc tx done : 0x13010003|0x0:
Spotted this test failure in today's run, not sure whether it's related
I think we've got 2 different types of failures. But the trigger failure seems to be with suspend-resume-with-capture. I'm looking into this
Spotted this test failure in today's run, not sure whether it's related
43923?model=LNLM_SDW_AIOC&testcase=check-suspend-resume-with-capture-5
[ 4823.610270] kernel: snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-lnl 0000:00:1f.3: ipc tx reply: 0x33000000|0x0: GLB_SET_PIPELINE_STATE [ 4823.610535] kernel: snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-lnl 0000:00:1f.3: ipc tx done : 0x13010004|0x0: GLB_SET_PIPELINE_STATE [ 4823.611330] kernel: snd_sof_intel_hda_common:hda_dsp_stream_trigger: sof-audio-pci-intel-lnl 0000:00:1f.3: FW Poll Status: reg[0x80]=0x14001e successful [ 4823.614095] kernel: snd_soc_rt1316_sdw:rt1316_io_init: rt1316-sdca sdw:0:2:025d:1316:01: rt1316_io_init hw_init complete [ 4823.614185] kernel: soundwire_bus:sdw_handle_slave_status: rt1316-sdca sdw:0:2:025d:1316:01: signaling initialization completion for Slave 1 [ 4823.614899] kernel: snd_soc_rt1316_sdw:rt1316_io_init: rt1316-sdca sdw:0:3:025d:1316:01: rt1316_io_init hw_init complete [ 4823.614913] kernel: soundwire_bus:sdw_handle_slave_status: rt1316-sdca sdw:0:3:025d:1316:01: signaling initialization completion for Slave 1 [ 4823.615022] kernel: snd_sof_intel_hda_common:hda_dai_trigger: soundwire_intel soundwire_intel.link.0: cmd=1 dai SDW0 Pin3 direction 1 [ 4823.615050] kernel: soundwire_intel soundwire_intel.link.0: ASoC: error at soc_dai_trigger on SDW0 Pin3: -22 [ 4823.617591] kernel: Jack In: ASoC: trigger FE cmd: 1 failed: -22 [ 4823.617767] kernel: snd_sof:sof_pcm_trigger: sof-audio-pci-intel-lnl 0000:00:1f.3: pcm: trigger stream 1 dir 1 cmd 0 [ 4823.617787] kernel: snd_sof:sof_ipc4_trigger_pipelines: sof-audio-pci-intel-lnl 0000:00:1f.3: trigger cmd: 0 state: 3 [ 4823.617804] kernel: snd_sof:sof_ipc4_set_pipeline_state: sof-audio-pci-intel-lnl 0000:00:1f.3: ipc4 set pipeline instance 1 state 3 [ 4823.617821] kernel: snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-lnl 0000:00:1f.3: ipc tx : 0x13010003|0x0: GLB_SET_PIPELINE_STATE [ 4823.619171] kernel: snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-lnl 0000:00:1f.3: ipc tx reply: 0x33000000|0x0: GLB_SET_PIPELINE_STATE [ 4823.619189] kernel: snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-lnl 0000:00:1f.3: ipc tx done : 0x13010003|0x0:
ok I now know what's causing the trigger failure after resume. So in the passing case, all pipelines (FE & BE) are started before the system suspend is initiated. In this case, we receive the suspend trigger for both the FE and BE DAI and all pipelines are suspended. And during resume, we receive the prepare callback for both the FE and BE DAI's and things are normal.
But in the failing case, what happens in the nocodec case is that the system suspend is initiated as soon the FE DAI hw_params is done but before the FE/BE DAI trigger is called and pipelines are started. Since the trigger start never happened, we never get the suspend trigger either. And during system resume, somehow we receive the prepare callback only for the FE DAI and not for the BE DAI and this is what causes the failure when the BE pipelines are triggered during BE DAI trigger. So the question is if the BE DAI suspend trigger was never called, why is the hext_stream NULL during resume. What causes it to be freed? And why dont we receive the BE DAI prepare callback during resume in this case? Still trying to figure out the answers for these questions
So here's my theory on whats happening. So when the system suspend happens after the bE DAI prepare but before BE DAI trigger, the BE DAI state is PREPARE and the suspend trigger does nothing. When the system resumes, the BE DAI prepare also does nothing also because the state is already PREPARE.
One way to fix this is set the dpcm state of the BE DAI to suspended even if it were is never started but in prepare state. I'm testing this right now.
@ranj063 this looks like a gap in the DPCM state machines
case SNDRV_PCM_TRIGGER_SUSPEND:
if (be->dpcm[stream].state != SND_SOC_DPCM_STATE_START)
goto next;
be->dpcm[stream].be_start--;
if (be->dpcm[stream].be_start != 0)
goto next;
ret = soc_pcm_trigger(be_substream, cmd);
if (ret) {
be->dpcm[stream].be_start++;
goto next;
}
be->dpcm[stream].state = SND_SOC_DPCM_STATE_SUSPEND;
break;
the if() test will not handle any case where the stream is not started.
@plbossart @ranj063 It seems we call assign_hext_stream() and release_hext_stream() unbalanced. hda_link_dma_hw_params() -> assign_hext_stream() is called in hda_dai_hw_params_data(). However, hda_link_dma_cleanup() -> release_hext_stream() is called in hda_dai_hw_free(), hda_dai_suspend(), and SNDRV_PCM_TRIGGER_SUSPEND. Will it cause any issue?
I am not sure if it's 'unbalanced'.
The hda_link_dma_hw_params() is called in multiple cases, for a regular hw_params but also for a prepare which can be a follow-up to the hw_params, or a way to deal with xruns or a resume.
Likewise we have multiple exit paths, this can be a regular stop/hw_free but we can also have the suspend-while-paused and the suspend-while-running case.
@ranj063 I also wonder if the call to snd_sof_dsp_hw_params_upon_resume() will not cleanup everything, even if the TRIGGER_SUSPEND does nothing.
@ranj063 I also wonder if the call to snd_sof_dsp_hw_params_upon_resume() will not cleanup everything, even if the TRIGGER_SUSPEND does nothing.
@plbossart i am still trying to confirm if this cleans up the dma data or not. My suspicion is that it ends up releasing the hext_stream because we do not check for the dpcm state here. Maybe one possibility is to restrict it for paused state only. But then I'm not sure if its a good thing to leave a stream in the PREPARE state during suspend.
Today's failure in case that's useful: 4012?model=LNLM_SDW_AIOC&testcase=check-suspend-resume-with-playback-5
@ssavati a fix for this issue has been merged to sof-dev. Please let us know if the issue can be closed now.
Daily 44091?model=LNLM_RVP_NOCODEC&testcase=check-suspend-resume-with-playback-5 still failed with some aplay: pcm_write:2127: write error: Input/output error error. It ran with daily commit 0dff6ac871fa which includes what I guess is the tentative fix d8347dd9bd42
BTW:
- https://github.com/thesofproject/sof/issues/8761
Daily 44091?model=LNLM_RVP_NOCODEC&testcase=check-suspend-resume-with-playback-5 still failed with some
aplay: pcm_write:2127: write error: Input/output errorerror. It ran with daily commit 0dff6ac which includes what I guess is the tentative fix d8347dd
ok so the kernel fix is for handling the trigger failure error but now we've got a different problem with an underrun occurring in the DAI in the firmware after resuming
[ 7846.830920]
This sounds like... progress?
Do you want to file a new bug or keep using this one?
This sounds like... progress?
Do you want to file a new bug or keep using this one?
I'm thinking this might be a firmware bug at this point. Let me analyze it a bit more before we decide to create a SOF bug