linux icon indicating copy to clipboard operation
linux copied to clipboard

dmaengine: dw-axi-dmac: Allow client-chosen width

Open pelwell opened this issue 1 year ago • 10 comments

For devices where transfer lengths are not known upfront, there is a danger when the destination is wider than the source that partial words can be lost at the end of a transfer. Ideally the controller would be able to flush the residue, but it can't - it's not even possible to tell that there is any.

Instead, allow the client driver to avoid the problem by setting a smaller width.

pelwell avatar Sep 19 '24 17:09 pelwell

@P33M The cause of the recent UART woes was the loss of this code fragment in the DMAC driver:

               /* Prevent partial access units getting lost */
               if (mem_width > reg_width)
                       mem_width = reg_width;

The partial access units in question are the bytes that have been received from the UART but not yet grouped into a word. If the UART driver decides that sufficient time has passed that any residue should be pushed to the tty framework, this partial word is inaccessible. And unlike the transmit case, there is no way of knowing what those bytes were.

Work around the problem by honouring a non-zero memory destination width (unless the DMAC driver knows it must be more restrictive).

pelwell avatar Sep 19 '24 17:09 pelwell

FYI, we've noticed a regression with recent kernels (after 6.6.47+rpt-rpi-2712) whereby a simple poll() on a uart returns immediately, resulting in read() returning EAGAIN, such that a trivial reader consumes 100% of one core. Scratching our heads quite a bit--the default reason is not to blame the kernel...

This pull request appears to prevent that behavior, but without a test case we're concerned it will re-appear in the future. It would be great if this could be considered.

kraln avatar Oct 10 '24 09:10 kraln

We've disabled UART DMA in the most recent kernels as a sure, safe workaround while we consider this some more. Are you still seeing poll() problems with 6.6.51 etc.?

pelwell avatar Oct 10 '24 10:10 pelwell

Are you still seeing poll() problems with 6.6.51 etc.?

OK: Linux xxx 6.6.47+rpt-rpi-2712 #1 SMP PREEMPT Debian 1:6.6.47-1+rpt1 (2024-09-02) aarch64 GNU/Linux

Broken: Linux xxx 6.6.51+rpt-rpi-2712 #1 SMP PREEMPT Debian 1:6.6.51-1+rpt2 (2024-10-01) aarch64 GNU/Linux Linux xxx 6.6.51+rpt-rpi-2712 #1 SMP PREEMPT Debian 1:6.6.51-1+rpt3 (2024-10-08) aarch64 GNU/Linux

OK: Linux xxx 6.6.51-v8-16k+ #1 SMP PREEMPT Thu Sep 19 18:05:24 UTC 2024 aarch64 GNU/Linux

I'll try and get a small testcase together

kraln avatar Oct 10 '24 10:10 kraln

Interesting - on 6.6.51+rpt-rpi-2712, what does cat /sys/kernel/debug/dmaengine/summary report?

pelwell avatar Oct 10 '24 10:10 pelwell

cat /sys/kernel/debug/dmaengine/summary

I'm a colleague of @kraln

On the non-working 6.6.51+rpt-rpi-2712 we get:

dma0 (1000010000.dma): number of channels: 5

dma1 (1000010600.dma): number of channels: 5 dma1chan0 | 107d004000.spi:tx dma1chan1 | 107d004000.spi:rx dma1chan2 | 107c701400.hdmi:audio-rx dma1chan3 | 107c706400.hdmi:audio-rx

dma2 (1f00188000.dma): number of channels: 8 dma2chan0 | 1f00050000.spi:rx dma2chan1 | 1f00050000.spi:tx

On the working kernel 6.6.51-v8-16k+, we get:

dma0 (1000010000.dma): number of channels: 5

dma1 (1000010600.dma): number of channels: 5 dma1chan0 | 107d004000.spi:tx dma1chan1 | 107d004000.spi:rx dma1chan2 | 107c701400.hdmi:audio-rx dma1chan3 | 107c706400.hdmi:audio-rx

dma2 (1f00188000.dma): number of channels: 8 dma2chan0 | 1f00050000.spi:rx dma2chan1 | 1f00050000.spi:tx dma2chan2 | 1f00030000.serial:tx dma2chan3 | 1f00030000.serial:rx

honx avatar Oct 10 '24 10:10 honx

Are you absolutely certain that you got those two sets of results the right way round? I expected the working system not to have any 1f00030000.serial:rx entries because UART DMA has been disabled.

There is a second question over why the two are different, which looks like an out-of-date dtb file. Note that you could try copying /boot/firmware/bcm2712-rpi-5-b.dtb from working to non-working, and rebooting.

pelwell avatar Oct 10 '24 10:10 pelwell

quick update: yes, the above dmaengine/summary output is the correct way around.

however, it seems our issue was only triggered by disabling DMA but looking at the traces, it seems poll/read work fine in both cases and its userland code thats buggy. subtle bug that did not show with DMA, as large data chungs were returned by read. but now, with disabled DMA, it just returns 16byte chunks, and userlands faulty assumptions break..

we'll debug some more but i think our problems are unrelated to the original issue...

honx avatar Oct 10 '24 11:10 honx

quick update: yes, the above dmaengine/summary output is the correct way around.

however, it seems our issue was only triggered by disabling DMA but looking at the traces, it seems poll/read work fine in both cases and its userland code thats buggy. subtle bug that did not show with DMA, as large data chungs were returned by read. but now, with disabled DMA, it just returns 16byte chunks, and userlands faulty assumptions break..

we'll debug some more but i think our problems are unrelated to the original issue...

ok, after some testing, we're sure that while kernel behaviour has changed, it is still correct. and the userland code must be fixed.

honx avatar Oct 10 '24 18:10 honx

OK, thanks for the confirmation.

pelwell avatar Oct 10 '24 18:10 pelwell

I2S (via convoluted setup in sound/soc/soc-generic-dmaengine-pcm.c) sets only the peripheral's buswidth. I assume that because tx/rx is effectively continuous, the trapped residue for rx doesn't matter?

P33M avatar Nov 08 '24 10:11 P33M

That was my thinking. And the fact that I2S data rate is potentially so much higher than the UARTs that we don't want to pay the penalty of the narrow transfers.

pelwell avatar Nov 08 '24 10:11 pelwell