embassy icon indicating copy to clipboard operation
embassy copied to clipboard

STM32 DMA double-buffering

Open Dirbaio opened this issue 2 years ago • 3 comments

We want to add some form of first-class double-buffering support, to allow endless streaming of data.

Example use cases

  • Streaming samples from ADC
  • Streaming samples to DAC
  • Streaming to/from I2S/SAI

Requirements

  1. Support double-buffering.
  2. Gap between transfers must be as small as possible (Ideally none, max the latency of an irq. The latency of a wake is too much.)
  3. There must not be UB even if irqs are delayed arbitrarily long (DMA must not wrap around and start overwriting the slice the user code is touching)

How to do this?

Satisfying the requirements is tricky. 3 essentially means we can't use DMA modes that "wrap around by default". For example, with circular buffer you might do this:

Start read onto a buffer in circular mode
Loop {
    Wait for HTIE, this means the 1st half is filled
    Hand the 1st half to the user, they process it
    Wait for TCIE, this means the 2nd half is filled
    Hand the 2nd half to the user, they process it
}}

However, if user takes too long to process the 1st half, DMA might wrap around and overwrite it from under them -> UB.

Unfortunately I believe it's "fundamentally impossible" to wrap DMA circular mode in a safe rust API :'(

The way we use DMA has to be something like "start writing to buf1, queue a write to buf2. When you're done with buf1 or buf2 tell me. but DO NOT wrap around back to buf1 until I tell you to do so", so if user code takes too long, DMA just stops (and maybe loses data) but there's no UB.

Idea 1: use M0AR/M1AR

There's some interesting ideas around on how to use M0AR/M1AR for this: writing a "poison" address to the next buffer (like 0xFFFF_FFFF) to get DMA to error and stop, then overwrite the poison with the real addr when it's safe to continue.

I'm not sure if this actually works in practice, or if it does it avoid UB in all cases.

Disadvantages:

  • Only works when the two bufs have the same length. Hardware has 2 addr regs but only 1 len reg :(
  • and only on chips with M0AR/M1AR (F2, F4, F7, H7, L5)

Idea 2: transfer queuing

Add a way in trait Channel to queue transfers. You start one transfer, queue the next. When a transfer finishes, the IRQ handler starts the next transfer if queued. DMA stops if there's no queued transfer.

This allows code (e.g. the ADC hal) to:

  • Start transfer to buf1
  • Queue transfer to buf2
  • When buf1 is filled, hand it to user code, then queue it again
  • When buf2 is filled, hand it to user code, then queue it again
  • Repeat

If user code is slow or IRQs are delayed, DMA loses data but there's no UB.

Disadvantages:

  • Time gap is the irq latency, it's not zero.

Original discussion in Matrix

Dirbaio avatar Apr 07 '22 22:04 Dirbaio

Further discussion revealed more information on double buffering with DMA/BDMA on different families and peripheral versions:

  • Families with BDMA v1/v2 (in RM often called DMA) cannot support Double Buffering as they lack hardware support. This means that it cannot be supported in hardware for families: F1, L1, F0, F3, L0, L4, G0, G4, WB, WL.
  • Families that have BDMA v3 support Double Buffered Mode - H7 and L5, where H7 support it on both DMA and BDMA.
  • F2, F4 and F7 support Double Buffering in their DMA peripheral.

matoushybl avatar Apr 11 '22 21:04 matoushybl

from the discussion on Matrix (Formatted) :

Idea 1 - fast, sound, only F2, F4, F7, H7, L5 -: Preferred options if hw permits it.

Idea 2 - slow, sound, all chips Would be easier to implement/understand/maintain than 1, but the IRQ latency is not negligeable.
It should be fine for audio on I2S/DAC, i.e. at 180 MHz core and 48 kHz sampling, you would have ~4000 cpu cycles for the IRQ, which should be enough, assuming that there are no long critical sections and IRQ priority is high. But if you have some high-frequency ADC sampling application then it will be noticable. There's at least one usecase where that doesn't work at all: transfering pictures from DCMI

Idea 3 - fast, unsound on overrun, all chips Use DMA circular mode - single buffer. On overrun, panic or stop DMA from IRQ then make the task return with "OverrunError". The second option is technically still unsound because by the time the IRQ fires, overrun (and therefore UB) already has happened (or perhaps stop DMA return an error to the user, though that's a bit more risky) This would allow us to get streaming DMA ADC/whatever working on ALL chips and then maybe we can later on apply idea 1 for the chips that do support it.

AntoineMugnier avatar Jul 25 '22 22:07 AntoineMugnier

After the previous discussion, we have stated to implement at least idea 3 and 1, and maybe 2; Suggested ordering of the tasks for the development: Idea 3 => Idea 1 => Idea 2

I'm starting working on Idea 3

AntoineMugnier avatar Aug 01 '22 19:08 AntoineMugnier