dmaengine: dw-axi-dmac: Allow client-chosen width
For devices where transfer lengths are not known upfront, there is a danger when the destination is wider than the source that partial words can be lost at the end of a transfer. Ideally the controller would be able to flush the residue, but it can't - it's not even possible to tell that there is any.
Instead, allow the client driver to avoid the problem by setting a smaller width.
@P33M The cause of the recent UART woes was the loss of this code fragment in the DMAC driver:
/* Prevent partial access units getting lost */
if (mem_width > reg_width)
mem_width = reg_width;
The partial access units in question are the bytes that have been received from the UART but not yet grouped into a word. If the UART driver decides that sufficient time has passed that any residue should be pushed to the tty framework, this partial word is inaccessible. And unlike the transmit case, there is no way of knowing what those bytes were.
Work around the problem by honouring a non-zero memory destination width (unless the DMAC driver knows it must be more restrictive).
FYI, we've noticed a regression with recent kernels (after 6.6.47+rpt-rpi-2712) whereby a simple poll() on a uart returns immediately, resulting in read() returning EAGAIN, such that a trivial reader consumes 100% of one core. Scratching our heads quite a bit--the default reason is not to blame the kernel...
This pull request appears to prevent that behavior, but without a test case we're concerned it will re-appear in the future. It would be great if this could be considered.
We've disabled UART DMA in the most recent kernels as a sure, safe workaround while we consider this some more. Are you still seeing poll() problems with 6.6.51 etc.?
Are you still seeing poll() problems with 6.6.51 etc.?
OK: Linux xxx 6.6.47+rpt-rpi-2712 #1 SMP PREEMPT Debian 1:6.6.47-1+rpt1 (2024-09-02) aarch64 GNU/Linux
Broken: Linux xxx 6.6.51+rpt-rpi-2712 #1 SMP PREEMPT Debian 1:6.6.51-1+rpt2 (2024-10-01) aarch64 GNU/Linux Linux xxx 6.6.51+rpt-rpi-2712 #1 SMP PREEMPT Debian 1:6.6.51-1+rpt3 (2024-10-08) aarch64 GNU/Linux
OK: Linux xxx 6.6.51-v8-16k+ #1 SMP PREEMPT Thu Sep 19 18:05:24 UTC 2024 aarch64 GNU/Linux
I'll try and get a small testcase together
Interesting - on 6.6.51+rpt-rpi-2712, what does cat /sys/kernel/debug/dmaengine/summary report?
cat /sys/kernel/debug/dmaengine/summary
I'm a colleague of @kraln
On the non-working 6.6.51+rpt-rpi-2712 we get:
dma0 (1000010000.dma): number of channels: 5
dma1 (1000010600.dma): number of channels: 5 dma1chan0 | 107d004000.spi:tx dma1chan1 | 107d004000.spi:rx dma1chan2 | 107c701400.hdmi:audio-rx dma1chan3 | 107c706400.hdmi:audio-rx
dma2 (1f00188000.dma): number of channels: 8 dma2chan0 | 1f00050000.spi:rx dma2chan1 | 1f00050000.spi:tx
On the working kernel 6.6.51-v8-16k+, we get:
dma0 (1000010000.dma): number of channels: 5
dma1 (1000010600.dma): number of channels: 5 dma1chan0 | 107d004000.spi:tx dma1chan1 | 107d004000.spi:rx dma1chan2 | 107c701400.hdmi:audio-rx dma1chan3 | 107c706400.hdmi:audio-rx
dma2 (1f00188000.dma): number of channels: 8 dma2chan0 | 1f00050000.spi:rx dma2chan1 | 1f00050000.spi:tx dma2chan2 | 1f00030000.serial:tx dma2chan3 | 1f00030000.serial:rx
Are you absolutely certain that you got those two sets of results the right way round? I expected the working system not to have any 1f00030000.serial:rx entries because UART DMA has been disabled.
There is a second question over why the two are different, which looks like an out-of-date dtb file. Note that you could try copying /boot/firmware/bcm2712-rpi-5-b.dtb from working to non-working, and rebooting.
quick update: yes, the above dmaengine/summary output is the correct way around.
however, it seems our issue was only triggered by disabling DMA but looking at the traces, it seems poll/read work fine in both cases and its userland code thats buggy. subtle bug that did not show with DMA, as large data chungs were returned by read. but now, with disabled DMA, it just returns 16byte chunks, and userlands faulty assumptions break..
we'll debug some more but i think our problems are unrelated to the original issue...
quick update: yes, the above dmaengine/summary output is the correct way around.
however, it seems our issue was only triggered by disabling DMA but looking at the traces, it seems poll/read work fine in both cases and its userland code thats buggy. subtle bug that did not show with DMA, as large data chungs were returned by read. but now, with disabled DMA, it just returns 16byte chunks, and userlands faulty assumptions break..
we'll debug some more but i think our problems are unrelated to the original issue...
ok, after some testing, we're sure that while kernel behaviour has changed, it is still correct. and the userland code must be fixed.
OK, thanks for the confirmation.
I2S (via convoluted setup in sound/soc/soc-generic-dmaengine-pcm.c) sets only the peripheral's buswidth. I assume that because tx/rx is effectively continuous, the trapped residue for rx doesn't matter?
That was my thinking. And the fact that I2S data rate is potentially so much higher than the UARTs that we don't want to pay the penalty of the narrow transfers.