SOF driver should report maximum DMA burst size to user-space via an ALSA interface
SOF sound devices can support large audio buffers, allowing audio playback/capture to continue without waking up the host CPU and memory as often and thus save power.
For ALSA applications, this shows up as bursty updates to the ALSA hw_ptr. Many/most applications can cope with this without issues, but some applications that assume hw_ptr advances linearly, this creates problems. [1, 2]. Probably most mainstream case is Pipewire when it decides how much data it needs to fill before starting a stream (or after an xrun). If the amount of data is smaller or close to the size of the max DMA burst, xruns or even xrun loops (repeated error to start with too little data) can be hit as audio device consumes all audio data before Pipewire has a chance to provide more data.
The amount of buffering depends on the hardware and the DSP topology/configuration. For a typical SOF PCM, maximum burst size is a few milliseconds, but there are DSP topologies where the maximum burst size can be in tens of milliseconds.
There is currently no established interface in ALSA for drivers to communicate the maximum burst size. Existing interfaces are surveyed in [2]. snd_pcm_hw_params_is_block_transfer() is probably the closest match, but it does not provide enough information about size of the burst.
Proposal is to extend ALSA driver and alsa-lib interface to make this information available to applications like Pipewire.
Alternatives to consider:
- SOF could declare SNDRV_PCM_INFO_BATCH. SOF has not set this a) would disable features in applications like Pipewire, and b) SOF can deliver accurate hw_ptr location (even during bursts).
- add a new info flag SNDRV_PCM_INFO_BURSTY_DMA. this would be overlapping with existing BLOCK_TRANSFER/BATCH flags and still not provide enough information to applications to fully understand driver behaviour.
- add an attribute to UCM -- easier to add, but the burst size can vary based on topology and hw, so maintaining correct information in UCM will be a challenge
References:
- [1] Pipewire bug hit when this information is not available https://gitlab.freedesktop.org/pipewire/pipewire/-/issues/4489
- [2] Metabug on alsa-conformance-test issues that triggered creation of this ticket https://github.com/thesofproject/sof/issues/8717
FYI @ujfalusi @lgirdwood @ford-prefect
@kv2019i, fwiw, PW/PA have exception for USB audio, which declares SNDRV_PCM_INFO_BATCH. It acknowledges this, but still enables the features that are blocked for other PCM devices. We could in theory do the same for SOF, but the SNDRV_PCM_INFO_BATCH says that the pointer resolution is limited to period size and that is not true for SOF, we can report accurate DMA position (and with IPC4 we can also report delay), we just have jumpy DMA.
@ujfalusi fwiw, the exception for USB devices is not present in PipeWire.
@ford-perfect, I think this does just that, no? https://gitlab.freedesktop.org/pipewire/pipewire/-/blob/master/spa/plugins/alsa/acp/alsa-util.c?ref_type=heads#L255
@ujfalusi I believe that code is only used while probing (in fact, only while probing the Pro Audio profile). The code for actually using the PCM is at https://gitlab.freedesktop.org/pipewire/pipewire/-/blob/master/spa/plugins/alsa/alsa-pcm.c#L2025 (ironically, in Pro Audio mode, we don't use timer-based scheduling at all, which means the headroom/BATCH flag doesn't kick in)
@ford-prefect, interesting. When I tried out of curiosity to set the BATCH for SOF then PW disabled tsched for non Pro audio as well.
I changed this check to a constructive if (1) and tsched was back with the other niceties to set the deadline a bit further from the hw_ptr.
But I don't know the PW (or PA) code at all, I was just blindly hacking...
@ujfalusi that's odd, I don't see any reason BATCH should cause tsched to be disabled (verified with my USB device here). Might be worth an upstream bug (with some PIPEWIRE_DEBUG=3 logs and pw-dump output).
USB is special cased to allow tsched even if it reports BATCH.
In my case, it uses tsched even if I disable the is_usb check in alsa-util.c -- which is what I expect based on the code. Just to sanity check, how do you verify that PW is using tsched vs. not?
Hrm, re-reading the code and I'm not sure anymore, but looks like that tsched is always enabled and can only be disabled with a config parameter? state->disable_tsched via api.alsa.disable-tsched
The other issue with BATCH is that it causes the period_wakes not to be disabled, which we don't want, that I hope I read right... https://gitlab.freedesktop.org/pipewire/pipewire/-/blob/master/spa/plugins/alsa/alsa-pcm.c?ref_type=heads#L2306
I'm a bit confused around the pa_alsa_set_hw_params(), but that looks to be a distraction
That's mostly correct: tsched is disabled for pro-audio devices, or if explicitly configured. And yes, you also read the thing about not disabling period interrupts correctly -- I'm not sure that's actually required, but maybe @wtay knows?
maybe @wtay knows?
What do I know... I kept the interrupts enabled because I thought they are used to update the ringbuffer read and write pointers.
@wtay ah! yes, I do think we've seen that in the past in PulseAudio.
Shouldn't apps looks at the 'delay' and not just the hw_ptr? The information is already reported by the 'status' ALSA API. Is there a need to invent something new?
Shouldn't apps looks at the 'delay' and not just the hw_ptr? The information is already reported by the 'status' ALSA API. Is there a need to invent something new?
Ack - apps should use delay for an accurate presentation position since the delay in FW will change based on DSP FW scheduling, pipeline design, use case and audio clock domains being used across the stream.
https://www.alsa-project.org/alsa-doc/alsa-lib/group___p_c_m___status.html#ga1fdce3985e64f66385a5805da1110f18
@plbossart @lgirdwood I think the problem is different. snd_pcm_delay() tells how long it will take samples written now, to be played out and it indeed should be correct even when there are buffering and DMA bursts happening in-between. It should be used when syncing audio playback e.g. to video rendering for lip-sync.
snd_pcm_delay() is the right tool for e.g. alsa-conformance-test when it tries to e.g. observe and verify the playback speed. To do this, snd_pcm_delay() should be used, not just hw_ptr (as latter just reflects moving data from host memory to DSP memory).
With pipewire, I think the problem is different ( https://gitlab.freedesktop.org/pipewire/pipewire/-/issues/4489 ):
- the ALSA buffer is empty, avail == buffer_size
- pipewire writes X samples to ALSA, starts the stream and goes to sleep for Y msec
- ALSA device does a burst transfer of the X samples in less time than Y and stream hits an XRUN immediately on start
- how can pipewire know this and adjust X/Y accordingly? (i.e. use a larger api.alsa.headroom)
I don't see snd_pcm_delay() as useful for this problem.
For apps, it would be simplest for drivers to report maximum burst length in audio frames. I.e. concept matching "api.alsa.headroom" in Pipewire. E.g. DSP could report 5ms max burst size for given PCM hw_params set (= value known after hw_params/sw_params is set). Pipewire would use this to tune the "api.alsa.headroom" value and keep more more samples written to ALSA.
This would still work correctly if we ship a DSP topology with very large buffers, like 100msec audio buffer with DSP configured to access DDR only when 25msec is left on DSP side. To app like Pipewire, this shows up an ALSA device that requires a large ALSA buffer_size (this we can express in ALSA already) and a very jumpy hw_ptr (this we cannot). But if max burst size (100msec in this) is reported to app, it could handle this kind of cases as well without special-casing the particular ALSA driver type.
A fixed property would also be easier to use in apps. snd_pcm_delay() is notoriously hard to use correctly in apps.
struct snd_pcm_hw_params has a
snd_pcm_uframes_t fifo_size; /* R: chip FIFO size in frames */
alsalib provides snd_pcm_hw_params_get_fifo_size(), yet, I cannot find any use of it in user space or in kernel (apart from copying it from here to there and from there to here).
I cannot find reliable information on what the fifo_size is :o
The pcm delay is not usable for sure in this case as @kv2019i already analyzed.
But the fifo_size is not directly mapping to how the host facing DMA works, we can have big FIFO in hardware but pacing the DMA in shorter bursts, it can imply the worst case since it is likely expected that on start the FIFO is filled, the buffer size and start threshold must not be shorter than that, but from there, it is really up to some unknown driver/hardware logic. Like with SOF IPC4 we have 2ms buffer on the host side for playback with 1ms DMA bursts, with 100ms host buffer we have the bursts probably around 90ms apart, certainly not 50ms.
We can force the buffer/period size in driver, but that does not convey the information that how much data must be there to avoid overrun, so we need start fill level (how much data must be in ALSA buffer before trigger start to avoid overrrun) burst size (the size of the steps the DMA will move in bursts)