embedded-hal
embedded-hal copied to clipboard
Explore DMA-based APIs built on top of `&'static mut` references
Summary of this thread as of 2018-04-02
Current status of the DMA API design:
I think we have a clear idea of what the API should provide:
It MUST
- be memory safe, obviously
- expose a device agnostic API, so we can use it to build DMA based HAL traits
- support one-shot and circular transfers
- support mem to peripheral and peripheral to mem transfers
- not require dynamic memory allocation, i.e. allocator support should be optional
It SHOULD support these features:
- Canceling an ongoing transfer
- Memory to memory transfers
- Peeking into the part of the buffer that the DMA has already written to
- Data widths other than 8-bit
- Using a part / range of the buffer; the API still needs to take ownership of the whole buffer in this scenario
This is out of scope:
- Configuration stuff like enabling burst mode or constraining usage of a channel to certain peripherals. The API for this functionality is up to the HAL implementer to provide / design.
What we know so far from existing implementations:
- Ownership and stable addresses are a must for memory safety
- (remember that
mem::forgetis safe in Rust) - The ongoing transfer value must take ownership of the buffer and the DMA channel
- storing
[T; N]in theTransferstruct doesn't work because the address is not stable -- it changes when theTransferstruct is moved &'static mutfulfills these two requirements but so doBox,Vecand other heap allocated collections. Seeowning_ref::StableAddressfor a more complete list.
- (remember that
- These concepts need to be abstracted over (using traits or structs):
- An ongoing DMA transfer
- A DMA channel, singletons make sense here
- Buffers compatible with DMA transfers. cf.
StableAddress. Also the element type must be restricted -- e.g. a transfer on&'static mut [Socket]doesn't make sense.
Unresolved questions:
- Alignment requirements. Does a transfer on a
[u16]buffer require the buffer to be 16-bit aligned? The answer probably depends on the target device.
Attempts at the problem:
-
Memory safe DMA transfers, a blog post that explores using
&'static mutreferences to achieve memory safe DMA transfers -
stm32f103xx-hal, PoC implementations of the idea present in the blog post. It contains implementations of one-shot and circular DMA transfers
- myeisha's DMA framework. Supports several heap
allocated collections like
BoxandVec.
What the title says. This blog post describes the approach to building such APIs. The last part of
the post covers platform agnostic traits that could be suitable for inclusion in embedded-hal.
This issue is for collecting feedback on the proposed approach and deciding on what should land in
embedded-hal.
Great stuff. I like how you're tackling DMA. However I feel there's one very important piece missing in the embedded-hal world: The ability to tie interrupts to HAL periperals.
You nicely described how DMA gets rid of the need to wait for the transfer to complete on the spot but you still need to spend (potentially) a lot of cycles to check whether your transfer has completed at some point. Whether that is better or not depends on the application, after all you still might do other processing in interrupt handlers while busy waiting for a single transmission to complete.
I should probably add that I view interrupt support as a much more important topic for async IO and DMA mostly as a method to reduce interrupt frequency and contention. DMA without interrupts is not actually that useful.
I still need to do some reading (my DMA knowledge is extremely limited), but it would also be interesting how to model peripheral to peripheral transfers.
For example, if you (for some reason) wanted to have a DMA driven UART passthrough (data comes in on one, goes out on the other), how would this be modelled? The transaction should take ownership of the serial ports and DMA channel, but wouldn't require a buf, as far as I know.
@jamesmunns As far as I know STM32 does not support peripheral to peripheral DMA transfers. Do other microcontrollers have such feature?
@protomors based on reading this thread, I thought it might be possible, though that might be more by accident than by design. AN4031 section 1.1.4 though seems to exclude p-to-p DMA.
I can find some mentions of peripheral to peripheral DMA being used on other architectures, such as the PIC32, as well as the PSoC3 (8051), but can't really find any compelling examples for Cortex-M based devices.
I don't have a use case (or experience with this), but it looks like this is "not a problem" at the moment.
Actually, I just found the PSoC5LP (Cortex-M3 based) does support this, judging by this app note from Cypress, but this still looks like a pretty rare feature (might not be worth defining interfaces/traits for the edge cases).
@jamesmunns Plenty of Cortex-M chips have a peripheral to peripheral communication but it's often not called DMA (because it isn't necessarily just that), e.g. the Nordic NRF have it and it's called Programmable Peripheral Interconnect (PPI) there, the ATSAMD call it EVSYS. As I mentioned before an important aspect of DMA is being able to process interrupts generated by the DMA engine and "p-to-p" is about letting one peripheral know that some other peripheral has finished processing, via DMA or not is not of importance. If you have to poll for DMA completion it's pretty much just a waste of resources.
@therealprof
However I feel there's one very important piece missing in the embedded-hal world: The ability to tie interrupts to HAL periperals.
Interrupts are a topic orthogonal to the DMA that deserve their own issue / discussion thread. I have my thoughts on the topic but I won't voice them in this issue.
Also you can already use HAL abstractions with interrupts using RTFM at zero cost.
You nicely described how DMA gets rid of the need to wait for the transfer to complete on the spot but you still need to spend (potentially) a lot of cycles to check whether your transfer has completed at some point.
You don't need interrupt handlers / callbacks to avoid busy waiting for the DMA transfer to complete either (if that's what you are implicitly suggesting). If you use cooperative multitasking (i.e. generators) you can suspend (yield) the task while the transfer is in progress, go do other tasks and/or sleep, and resume the first task only when the transfer is done -- no busy waiting or interrupt handlers involved.
@jamesmunns
I don't know about peripheral to peripheral transfer but the reference implementation is missing DMA functionality like memory to memory transfers, priority levels and burst mode. However, I think that functionality could be provided as methods on the channel types; it just needs to be implemented. Mem2mem transfers sound like they may be useful to have as a trait but I'm not sure if priority and burst mode should have traits.
You don't need interrupt handlers / callbacks to avoid busy waiting
What I'm trying to get at here is that there are a few ways to avoid the busy waiting and they have more to do with the task model (reactive / callback-y / interrupt-driven, cooperative, serial / blocking, etc.) than the DMA abstraction. I'd like the DMA abstraction to be general / flexible enough to support all the task models out there with minimal code duplication (I don't know if that's possible at this point though)
@japaric
Also you can already use HAL abstractions with interrupts using RTFM at zero cost.
Well, yes, you can. But it's a completely manual and static setup and fully separate from the hal drivers.
no busy waiting or interrupt handlers involved.
That's not what I meant. Even with cooperative approaches you need to periodically check in whether the transfer has finished and depending on how slow the transfer is you might still waste lots of MCU cycles doing that.
What I'm trying to get at here is that there are a few ways to avoid the busy waiting and they have more to do with the task model (reactive / callback-y / interrupt-driven, cooperative, serial / blocking, etc.) than the DMA abstraction. I'd like the DMA abstraction to be general / flexible enough to support all the task models out there with minimal code duplication (I don't know if that's possible at this point though)
Fair enough. I still stand by my point that DMA was introduced to reduce interrupt overhead and contention and is somewhat pointless without being able to use interrupts.
Even with cooperative approaches you need to periodically check in whether the transfer has finished and depending on how slow the transfer is you might still waste lots of MCU cycles doing that.
With a naive implementation, yes. Above, I was refering to a Tokio (epoll / kqueue) like implementation; there a task is only resumed when it can make progress -- this eliminates the continuous polling you mention. I already have an implementation in my head but haven't had time to test it; it uses the interrupt mechanism (NVIC), because otherwise you can't go to sleep, but doesn't involve any (device specific) interrupt handler.
With a naive implementation, yes. Above, I was refering to a Tokio (epoll / kqueue) like implementation; there a task is only resumed when it can make progress -- this eliminates the continuous polling you mention.
So how would a task know it can make progress without polling or being notified by an interrupt handler?
it uses the interrupt mechanism (NVIC), because otherwise you can't go to sleep, but doesn't involve any (device specific) interrupt handler.
That's yet another aspect why you want to be able to use interrupts. ;)
I'll be happy to look at your ideas when you have something. Just wanted to throw in that for me personally DMA is way down on the implementation list without a better way to use interrupts with drivers, it's just a ton of work to implement with very little benefit over a stupid/simple register based implementation.
Current status of the DMA API design:
I think we have a clear idea of what the API should provide:
It MUST
- be memory safe, obviously
- expose a device agnostic API, so we can use it to build DMA based HAL traits
- support one-shot and circular transfers
- support mem to peripheral and peripheral to mem transfers
- not require dynamic memory allocation, i.e. allocator support should be optional
It SHOULD support these features:
- Canceling an ongoing transfer
- Memory to memory transfers
- Peeking into the part of the buffer that the DMA has already written to
- Data widths other than 8-bit
- Using a part / range of the buffer; the API still needs to take ownership of the whole buffer in this scenario
This is out of scope:
- Configuration stuff like enabling burst mode or constraining usage of a channel to certain peripherals. The API for this functionality is up to the HAL implementer to provide / design.
What we know so far from existing implementations:
- Ownership and stable addresses are a must for memory safety
- (remember that
mem::forgetis safe in Rust) - The ongoing transfer value must take ownership of the buffer and the DMA channel
- storing
[T; N]in theTransferstruct doesn't work because the address is not stable -- it changes when theTransferstruct is moved &'static mutfulfills these two requirements but so doBox,Vecand other heap allocated collections. Seeowning_ref::StableAddressfor a more complete list.
- (remember that
- These concepts need to be abstracted over (using traits or structs):
- An ongoing DMA transfer
- A DMA channel, singletons make sense here
- Buffers compatible with DMA transfers. cf.
StableAddress. Also the element type must be restricted -- e.g. a transfer on&'static mut [Socket]doesn't make sense.
Unresolved questions:
- Alignment requirements. Does a transfer on a
[u16]buffer require the buffer to be 16-bit aligned? The answer probably depends on the target device.
Attempts at the problem:
-
Memory safe DMA transfers, a blog post that explores using
&'static mutreferences to achieve memory safe DMA transfers -
stm32f103xx-hal, PoC implementations of the idea present in the blog post. It contains implementations of one-shot and circular DMA transfers
- myeisha's DMA framework. Supports several heap
allocated collections like
BoxandVec.
@japaric IIRC (at least with DMA USB drivers) regardless of word size the buffer has to be 32-bit aligned on the m3's I've used. Almost definitely varies with platfom though.
I was a bit stuck trying to figure out how to use this for a while with RTFM: https://github.com/japaric/ws2812b/issues/5
I found examples/mpu9250-log.rs useful, especially this trick for what to pass around in resources rather than lots of individual items:
type TX = Option<Either<(TX_BUF, dma1::C4, Tx<USART1>), Transfer<R, TX_BUF, dma1::C4, Tx<USART1>>>>;
I was wishing for a way to reuse a Transfer repeatably (update buffer and 'go again') or get back all the channels and parts to make a new one over the same resources. This does that, but I wonder if there's a nicer way to put more of this in the generic hal rather than user code?
However, this (in the SYS_TICK handler of that example) surprised me:
let (buf, c, tx) = match r.TX.take().unwrap() {
Either::Left((buf, c, tx)) => (buf, c, tx),
Either::Right(trans) => trans.wait(),
};
wait() is a blocking call, and I had just assumed I couldn't use it! Perhaps it would be fine in a handler for a DMA completion interrupt, where we're already 'sure' it won't block, but this is in a timer and seems like that guarantee doesn't hold. I guess it's just being glossed over for the sake of a simple example, and that the ticks are assumed to be far enough apart that the dma will have completed?
So:
- have I misunderstood, and
wait()is somehow ok to use generally? - ~~is there a non-blocking
is_done()or some similar check so the tick handler could return early and come back next time? (say we have larger buffers or serial flow control)~~ Edit: yes there is, duh. - is
wait()always ok to use in the corresponding dma completion interrrupt handler? - should proper code for this log example listen for the dma completion, call
.wait()there to update the TX, and have the tick handler exit early or otherwise handle an incomplete transfer when the timer expires?
I spent some time adding DMA support to the stm32l4x6_hal crate, and wanted to add some notes. I attempted to use the Transfer and CircBuffer abstractions with serial ports and RTFM.
Because the Transfer owns its Tx or Rx pin and the buffer, and wait()ing on the Transfer consumes the Transfer (returning the buffer and pin), it's not possible to use them in a task because whatever you pass to the task must be consumed to start or end the transfer. For example, a DMA1_CHANNEL5 task must move the Transfer resource to operate on the buffer contents, which it cannot do. I ended up adding a couple of functions: restart for Transfer and partial_peek for CircBuffer which return slices of the buffer. They work well enough for receiving data, but I haven't found a good solution for transmitting because of the inability to pass Transfers around.
So, the PS2, which I've been developing on, has a hardware requirement for 16-byte aligned DMA (it ignores the lower four bits of the specified memory address).
For the sake of throwing ideas out, perhaps the implementation could define a zero-sized Alignment type which is #[repr(align(T))] for T-byte minimum alignment, and then the user can either borrow from the aligned crate, or use their own.
I'd rather not reinvent the DMA interface wheel if somebody's going to change its shape.
Because the Transfer owns its Tx or Rx pin and the buffer, and
wait()ing on the Transfer consumes the Transfer (returning the buffer and pin), it's not possible to use them in a task because whatever you pass to the task must be consumed to start or end the transfer. For example, a DMA1_CHANNEL5 task must move the Transfer resource to operate on the buffer contents, which it cannot do.
I think one way to do it is to declare your resource Option<Tx<...>> then in your task you .take() the resource, that gets it, and other tasks will just see a none.
the comment right above yours is actually a complicated way to do it.
The Option part is for taking the object, and the Either part is because either there is a transfer running and you need to wait() it to get the bit necessary to start a new transfer, or there is currently no transfer running and it just stores the bits and pieces to start a transfer.
On the topic of this RFC, I see a need for non-static DMA buffer for transactions that will be using the DMA but are blocking.
It might seem counterintuitive, but the byte-by-byte APIs of various MCU devices are sometimes impossible to use at the speed of the physical interface (I suspect it's because of the bus traffic in between the bytes), but there is still no interest in going full async on the API (in particular by blocking we can keep the buffer on the stack, then release it).
Here is a mention of the problem: https://javakys.wordpress.com/2014/09/04/how-to-implement-full-duplex-spi-communication-using-spi-dma-mode-on-stm32f2xx-or-stm32f4xx/
Here is some code for your consideration: https://github.com/nraynaud/stm32f103xx-hal/blob/153b97b682790bc16169868a02c7df97c8c601c6/src/serial.rs#L508
pub trait WriteDmaImmediately<A>: DmaChannel
where
A: AsRef<[u8]> + ?Sized,
Self: core::marker::Sized,
{
fn write_immediately(self, chan: Self::Dma, buffer: &A) -> (&A, Self::Dma, Self);
}
On a slightly different subject, I have rendered the buffer + ?Sized in my DMA APIs, so that I can pass slices to the DMA transfer.
@nraynaud if the operation is blocking I think there's no need for taking the receiver, or the buffer, by value. Something like fn write_all(&mut self, buffer: &[u8]) should work fine. Note that you also need the compiler fences in blocking code.
Also, the blog post linked in the issue description is a bit dated by now. The embedonomicon has the latest information on DMA; there have been a few updates: Pin instead of &'static mut (which allows Box, Rc, etc) and the compiler fences have been softened (while preserving correctness).
Thanks, I will read all that, and try to use references. It didn't occur to me.
Do you have a documentation/blog post on the compiler fences and when to use them? What about volatile?
@nraynaud the embedonomicon explains why the compiler fences are needed. volatile ops can't be reordered wrt to each other but non-volatile opss (like operations on buffers) can be reordered wrt to volatile ops; the compiler fences are there to prevent the latter, which can result in compiler misoptimizations.
I've tried to rewrite dma transfer according embedonomicon here but haven't tested it yet.
How are these efforts going? I noticed that the last posting on the thread is from Feb 11 2019. I am new to embedded rust and would like to play with some DMA stuff. While I could just manipulate the chip registers directly and cobble something together, there has obviously been much more thought put into this here. I would like to play with what you guys are suggesting. :)
@justacec More DMA implementations keep appearing. I've personally worked on two (stm32f7xx-hal, stm32l0xx-hal) over the last weeks.
I'm not aware of any efforts for creating a platform-independent API that could be included into embedded-hal. That would be great to have, but also a significant effort, I believe.
I submitted a draft PR for the stm32f1xx-hal crate (https://github.com/stm32-rs/stm32f1xx-hal/pull/244) which allows circular DMA buffers to support reading arbitrary lengths of data as it comes in.