embedded-hal Explore DMA-based APIs built on top of `&'static mut` references

trafficstars

Summary of this thread as of 2018-04-02

Current status of the DMA API design:

I think we have a clear idea of what the API should provide:

It MUST

be memory safe, obviously
expose a device agnostic API, so we can use it to build DMA based HAL traits
support one-shot and circular transfers
support mem to peripheral and peripheral to mem transfers
not require dynamic memory allocation, i.e. allocator support should be optional

It SHOULD support these features:

Canceling an ongoing transfer
Memory to memory transfers
Peeking into the part of the buffer that the DMA has already written to
Data widths other than 8-bit
Using a part / range of the buffer; the API still needs to take ownership of the whole buffer in this scenario

This is out of scope:

Configuration stuff like enabling burst mode or constraining usage of a channel to certain peripherals. The API for this functionality is up to the HAL implementer to provide / design.

What we know so far from existing implementations:

Ownership and stable addresses are a must for memory safety
- (remember that mem::forget is safe in Rust)
- The ongoing transfer value must take ownership of the buffer and the DMA channel
- storing [T; N] in the Transfer struct doesn't work because the address is not stable -- it changes when the Transfer struct is moved
- &'static mut fulfills these two requirements but so do Box, Vec and other heap allocated collections. See owning_ref::StableAddress for a more complete list.

These concepts need to be abstracted over (using traits or structs):
- An ongoing DMA transfer
- A DMA channel, singletons make sense here
- Buffers compatible with DMA transfers. cf. StableAddress. Also the element type must be restricted -- e.g. a transfer on &'static mut [Socket] doesn't make sense.

Unresolved questions:

Alignment requirements. Does a transfer on a [u16] buffer require the buffer to be 16-bit aligned? The answer probably depends on the target device.

Attempts at the problem:

Memory safe DMA transfers, a blog post that explores using &'static mut references to achieve memory safe DMA transfers
stm32f103xx-hal, PoC implementations of the idea present in the blog post. It contains implementations of one-shot and circular DMA transfers

myeisha's DMA framework. Supports several heap allocated collections like Box and Vec.

What the title says. This blog post describes the approach to building such APIs. The last part of the post covers platform agnostic traits that could be suitable for inclusion in embedded-hal.

This issue is for collecting feedback on the proposed approach and deciding on what should land in embedded-hal.

Feb 09 '18 10:02 japaric

Great stuff. I like how you're tackling DMA. However I feel there's one very important piece missing in the embedded-hal world: The ability to tie interrupts to HAL periperals.

You nicely described how DMA gets rid of the need to wait for the transfer to complete on the spot but you still need to spend (potentially) a lot of cycles to check whether your transfer has completed at some point. Whether that is better or not depends on the application, after all you still might do other processing in interrupt handlers while busy waiting for a single transmission to complete.

Feb 09 '18 11:02 therealprof

I should probably add that I view interrupt support as a much more important topic for async IO and DMA mostly as a method to reduce interrupt frequency and contention. DMA without interrupts is not actually that useful.

Feb 09 '18 11:02 therealprof

I still need to do some reading (my DMA knowledge is extremely limited), but it would also be interesting how to model peripheral to peripheral transfers.

For example, if you (for some reason) wanted to have a DMA driven UART passthrough (data comes in on one, goes out on the other), how would this be modelled? The transaction should take ownership of the serial ports and DMA channel, but wouldn't require a buf, as far as I know.

Feb 09 '18 14:02 jamesmunns

@jamesmunns As far as I know STM32 does not support peripheral to peripheral DMA transfers. Do other microcontrollers have such feature?

Feb 09 '18 16:02 protomors

@protomors based on reading this thread, I thought it might be possible, though that might be more by accident than by design. AN4031 section 1.1.4 though seems to exclude p-to-p DMA.

I can find some mentions of peripheral to peripheral DMA being used on other architectures, such as the PIC32, as well as the PSoC3 (8051), but can't really find any compelling examples for Cortex-M based devices.

I don't have a use case (or experience with this), but it looks like this is "not a problem" at the moment.

Feb 09 '18 16:02 jamesmunns

Actually, I just found the PSoC5LP (Cortex-M3 based) does support this, judging by this app note from Cypress, but this still looks like a pretty rare feature (might not be worth defining interfaces/traits for the edge cases).

Feb 09 '18 16:02 jamesmunns

@jamesmunns Plenty of Cortex-M chips have a peripheral to peripheral communication but it's often not called DMA (because it isn't necessarily just that), e.g. the Nordic NRF have it and it's called Programmable Peripheral Interconnect (PPI) there, the ATSAMD call it EVSYS. As I mentioned before an important aspect of DMA is being able to process interrupts generated by the DMA engine and "p-to-p" is about letting one peripheral know that some other peripheral has finished processing, via DMA or not is not of importance. If you have to poll for DMA completion it's pretty much just a waste of resources.

Feb 12 '18 18:02 therealprof

@therealprof

However I feel there's one very important piece missing in the embedded-hal world: The ability to tie interrupts to HAL periperals.

Interrupts are a topic orthogonal to the DMA that deserve their own issue / discussion thread. I have my thoughts on the topic but I won't voice them in this issue.

Also you can already use HAL abstractions with interrupts using RTFM at zero cost.

You nicely described how DMA gets rid of the need to wait for the transfer to complete on the spot but you still need to spend (potentially) a lot of cycles to check whether your transfer has completed at some point.

You don't need interrupt handlers / callbacks to avoid busy waiting for the DMA transfer to complete either (if that's what you are implicitly suggesting). If you use cooperative multitasking (i.e. generators) you can suspend (yield) the task while the transfer is in progress, go do other tasks and/or sleep, and resume the first task only when the transfer is done -- no busy waiting or interrupt handlers involved.

@jamesmunns

I don't know about peripheral to peripheral transfer but the reference implementation is missing DMA functionality like memory to memory transfers, priority levels and burst mode. However, I think that functionality could be provided as methods on the channel types; it just needs to be implemented. Mem2mem transfers sound like they may be useful to have as a trait but I'm not sure if priority and burst mode should have traits.

Feb 13 '18 20:02 japaric

You don't need interrupt handlers / callbacks to avoid busy waiting

What I'm trying to get at here is that there are a few ways to avoid the busy waiting and they have more to do with the task model (reactive / callback-y / interrupt-driven, cooperative, serial / blocking, etc.) than the DMA abstraction. I'd like the DMA abstraction to be general / flexible enough to support all the task models out there with minimal code duplication (I don't know if that's possible at this point though)

Feb 13 '18 20:02 japaric

@japaric

Also you can already use HAL abstractions with interrupts using RTFM at zero cost.

Well, yes, you can. But it's a completely manual and static setup and fully separate from the hal drivers.

no busy waiting or interrupt handlers involved.

That's not what I meant. Even with cooperative approaches you need to periodically check in whether the transfer has finished and depending on how slow the transfer is you might still waste lots of MCU cycles doing that.

What I'm trying to get at here is that there are a few ways to avoid the busy waiting and they have more to do with the task model (reactive / callback-y / interrupt-driven, cooperative, serial / blocking, etc.) than the DMA abstraction. I'd like the DMA abstraction to be general / flexible enough to support all the task models out there with minimal code duplication (I don't know if that's possible at this point though)

Fair enough. I still stand by my point that DMA was introduced to reduce interrupt overhead and contention and is somewhat pointless without being able to use interrupts.

Feb 13 '18 20:02 therealprof

Even with cooperative approaches you need to periodically check in whether the transfer has finished and depending on how slow the transfer is you might still waste lots of MCU cycles doing that.

With a naive implementation, yes. Above, I was refering to a Tokio (epoll / kqueue) like implementation; there a task is only resumed when it can make progress -- this eliminates the continuous polling you mention. I already have an implementation in my head but haven't had time to test it; it uses the interrupt mechanism (NVIC), because otherwise you can't go to sleep, but doesn't involve any (device specific) interrupt handler.

Feb 13 '18 22:02 japaric

With a naive implementation, yes. Above, I was refering to a Tokio (epoll / kqueue) like implementation; there a task is only resumed when it can make progress -- this eliminates the continuous polling you mention.

So how would a task know it can make progress without polling or being notified by an interrupt handler?

it uses the interrupt mechanism (NVIC), because otherwise you can't go to sleep, but doesn't involve any (device specific) interrupt handler.

That's yet another aspect why you want to be able to use interrupts. ;)

I'll be happy to look at your ideas when you have something. Just wanted to throw in that for me personally DMA is way down on the implementation list without a better way to use interrupts with drivers, it's just a ton of work to implement with very little benefit over a stupid/simple register based implementation.

Feb 13 '18 23:02 therealprof

Current status of the DMA API design:

I think we have a clear idea of what the API should provide:

It MUST

be memory safe, obviously
expose a device agnostic API, so we can use it to build DMA based HAL traits
support one-shot and circular transfers
support mem to peripheral and peripheral to mem transfers
not require dynamic memory allocation, i.e. allocator support should be optional

It SHOULD support these features:

Canceling an ongoing transfer
Memory to memory transfers
Peeking into the part of the buffer that the DMA has already written to
Data widths other than 8-bit
Using a part / range of the buffer; the API still needs to take ownership of the whole buffer in this scenario

This is out of scope:

Configuration stuff like enabling burst mode or constraining usage of a channel to certain peripherals. The API for this functionality is up to the HAL implementer to provide / design.

What we know so far from existing implementations:

Ownership and stable addresses are a must for memory safety
- (remember that mem::forget is safe in Rust)
- The ongoing transfer value must take ownership of the buffer and the DMA channel
- storing [T; N] in the Transfer struct doesn't work because the address is not stable -- it changes when the Transfer struct is moved
- &'static mut fulfills these two requirements but so do Box, Vec and other heap allocated collections. See owning_ref::StableAddress for a more complete list.

These concepts need to be abstracted over (using traits or structs):
- An ongoing DMA transfer
- A DMA channel, singletons make sense here
- Buffers compatible with DMA transfers. cf. StableAddress. Also the element type must be restricted -- e.g. a transfer on &'static mut [Socket] doesn't make sense.

Unresolved questions:

Alignment requirements. Does a transfer on a [u16] buffer require the buffer to be 16-bit aligned? The answer probably depends on the target device.

Attempts at the problem:

Memory safe DMA transfers, a blog post that explores using &'static mut references to achieve memory safe DMA transfers
stm32f103xx-hal, PoC implementations of the idea present in the blog post. It contains implementations of one-shot and circular DMA transfers

myeisha's DMA framework. Supports several heap allocated collections like Box and Vec.

Apr 01 '18 23:04 japaric

@japaric IIRC (at least with DMA USB drivers) regardless of word size the buffer has to be 32-bit aligned on the m3's I've used. Almost definitely varies with platfom though.

Apr 02 '18 00:04 ryankurte

I was a bit stuck trying to figure out how to use this for a while with RTFM: https://github.com/japaric/ws2812b/issues/5

I found examples/mpu9250-log.rs useful, especially this trick for what to pass around in resources rather than lots of individual items:

type TX = Option<Either<(TX_BUF, dma1::C4, Tx<USART1>), Transfer<R, TX_BUF, dma1::C4, Tx<USART1>>>>;

I was wishing for a way to reuse a Transfer repeatably (update buffer and 'go again') or get back all the channels and parts to make a new one over the same resources. This does that, but I wonder if there's a nicer way to put more of this in the generic hal rather than user code?

However, this (in the SYS_TICK handler of that example) surprised me:

    let (buf, c, tx) = match r.TX.take().unwrap() {
        Either::Left((buf, c, tx)) => (buf, c, tx),
        Either::Right(trans) => trans.wait(),
    };

wait() is a blocking call, and I had just assumed I couldn't use it! Perhaps it would be fine in a handler for a DMA completion interrupt, where we're already 'sure' it won't block, but this is in a timer and seems like that guarantee doesn't hold. I guess it's just being glossed over for the sake of a simple example, and that the ticks are assumed to be far enough apart that the dma will have completed?

So:

have I misunderstood, and wait() is somehow ok to use generally?
~~is there a non-blocking is_done() or some similar check so the tick handler could return early and come back next time? (say we have larger buffers or serial flow control)~~ Edit: yes there is, duh.
is wait() always ok to use in the corresponding dma completion interrrupt handler?
should proper code for this log example listen for the dma completion, call .wait() there to update the TX, and have the tick handler exit early or otherwise handle an incomplete transfer when the timer expires?

Apr 17 '18 02:04 dcarosone

I spent some time adding DMA support to the stm32l4x6_hal crate, and wanted to add some notes. I attempted to use the Transfer and CircBuffer abstractions with serial ports and RTFM.

Because the Transfer owns its Tx or Rx pin and the buffer, and wait()ing on the Transfer consumes the Transfer (returning the buffer and pin), it's not possible to use them in a task because whatever you pass to the task must be consumed to start or end the transfer. For example, a DMA1_CHANNEL5 task must move the Transfer resource to operate on the buffer contents, which it cannot do. I ended up adding a couple of functions: restart for Transfer and partial_peek for CircBuffer which return slices of the buffer. They work well enough for receiving data, but I haven't found a good solution for transmitting because of the inability to pass Transfers around.

May 05 '18 00:05 thenewwazoo

So, the PS2, which I've been developing on, has a hardware requirement for 16-byte aligned DMA (it ignores the lower four bits of the specified memory address).

For the sake of throwing ideas out, perhaps the implementation could define a zero-sized Alignment type which is #[repr(align(T))] for T-byte minimum alignment, and then the user can either borrow from the aligned crate, or use their own.

I'd rather not reinvent the DMA interface wheel if somebody's going to change its shape.

Nov 06 '18 22:11 Ravenslofty

Because the Transfer owns its Tx or Rx pin and the buffer, and wait()ing on the Transfer consumes the Transfer (returning the buffer and pin), it's not possible to use them in a task because whatever you pass to the task must be consumed to start or end the transfer. For example, a DMA1_CHANNEL5 task must move the Transfer resource to operate on the buffer contents, which it cannot do.

I think one way to do it is to declare your resource Option<Tx<...>> then in your task you .take() the resource, that gets it, and other tasks will just see a none. the comment right above yours is actually a complicated way to do it.

The Option part is for taking the object, and the Either part is because either there is a transfer running and you need to wait() it to get the bit necessary to start a new transfer, or there is currently no transfer running and it just stores the bits and pieces to start a transfer.

Feb 10 '19 19:02 nraynaud

On the topic of this RFC, I see a need for non-static DMA buffer for transactions that will be using the DMA but are blocking.

It might seem counterintuitive, but the byte-by-byte APIs of various MCU devices are sometimes impossible to use at the speed of the physical interface (I suspect it's because of the bus traffic in between the bytes), but there is still no interest in going full async on the API (in particular by blocking we can keep the buffer on the stack, then release it).

Here is a mention of the problem: https://javakys.wordpress.com/2014/09/04/how-to-implement-full-duplex-spi-communication-using-spi-dma-mode-on-stm32f2xx-or-stm32f4xx/

Here is some code for your consideration: https://github.com/nraynaud/stm32f103xx-hal/blob/153b97b682790bc16169868a02c7df97c8c601c6/src/serial.rs#L508

pub trait WriteDmaImmediately<A>: DmaChannel
    where
        A: AsRef<[u8]> + ?Sized,
        Self: core::marker::Sized,
{
    fn write_immediately(self, chan: Self::Dma, buffer: &A) -> (&A, Self::Dma, Self);
}

On a slightly different subject, I have rendered the buffer + ?Sized in my DMA APIs, so that I can pass slices to the DMA transfer.

Feb 10 '19 19:02 nraynaud

@nraynaud if the operation is blocking I think there's no need for taking the receiver, or the buffer, by value. Something like fn write_all(&mut self, buffer: &[u8]) should work fine. Note that you also need the compiler fences in blocking code.

Also, the blog post linked in the issue description is a bit dated by now. The embedonomicon has the latest information on DMA; there have been a few updates: Pin instead of &'static mut (which allows Box, Rc, etc) and the compiler fences have been softened (while preserving correctness).

Feb 11 '19 20:02 japaric

Thanks, I will read all that, and try to use references. It didn't occur to me.

Do you have a documentation/blog post on the compiler fences and when to use them? What about volatile?

Feb 11 '19 20:02 nraynaud

@nraynaud the embedonomicon explains why the compiler fences are needed. volatile ops can't be reordered wrt to each other but non-volatile opss (like operations on buffers) can be reordered wrt to volatile ops; the compiler fences are there to prevent the latter, which can result in compiler misoptimizations.

Feb 11 '19 20:02 japaric

I've tried to rewrite dma transfer according embedonomicon here but haven't tested it yet.

Feb 11 '19 20:02 burrbull

How are these efforts going? I noticed that the last posting on the thread is from Feb 11 2019. I am new to embedded rust and would like to play with some DMA stuff. While I could just manipulate the chip registers directly and cobble something together, there has obviously been much more thought put into this here. I would like to play with what you guys are suggesting. :)

Jul 24 '19 01:07 justacec

@justacec More DMA implementations keep appearing. I've personally worked on two (stm32f7xx-hal, stm32l0xx-hal) over the last weeks.

I'm not aware of any efforts for creating a platform-independent API that could be included into embedded-hal. That would be great to have, but also a significant effort, I believe.

Jul 24 '19 06:07 hannobraun

I submitted a draft PR for the stm32f1xx-hal crate (https://github.com/stm32-rs/stm32f1xx-hal/pull/244) which allows circular DMA buffers to support reading arbitrary lengths of data as it comes in.

Jul 16 '20 09:07 rudihorn

embedded-hal embedded-hal copied to clipboard

Explore DMA-based APIs built on top of `&'static mut` references

Summary of this thread as of 2018-04-02

embedded-hal
embedded-hal copied to clipboard