embassy
embassy copied to clipboard
stm32: add driver to use SPI as slave
This is a rough initial implementation of using the STM32 as an SPI slave device to think about possible APIs.
The non-DMA methods taking buffers don't make a ton of sense at the moment. The internal FIFO buffer of the SPI peripheral is 32 bits, which means these methods will never be able to read or write large buffers. The master device I'm testing with is a Raspberry Pi which doesn't clock lower than a few kilohertz, so it's difficult to see exactly what the behavior is e.g. when starting to read late in a transfer. Perhaps these methods should work on single words, allowing also to implement embedded_hal_nb::serial::Read and Write (though these traits don't include a bidirectional transfer).
The asynchronous version is implemented using DMA backed by ring buffers. This will allow implementing some embedded-io-async traits. On the chip I'm testing with (STM32WB55RGVx on a development board) the DMA-backed communication works up to somewhere between 2.5 and 3.0 MHz continously before overruns start happening. This seems a bit slow: the reference manual indicates this chip should be able to reach 24 Mbit/s.
This chip is SPIv2, I haven't tested the other versions.
This further needs some additional constructors configuring rx-only, tx-only, etc.
Closes https://github.com/embassy-rs/embassy/issues/623
I've changed the non-DMA methods to operate on single words with a non-blocking signature using nb, similar to e.g. Uart::nb_read.
I sucessfully tested this branch with an STM32F0R8 device, works very well so far :+1:
I guess I'll test it on a L0. I can report how it went :blush:
I've pulled in the changes from the main branch, but still need to update the API to match some of the changes (such as be087e5d43326178878878bebae96f3b51a3b184). I've also not tested on a board yet.
I've tested this on a stm32h563zi with minor modification to allow for RX only operation, but I don't seem to receive any data though.
--- a/embassy-stm32/src/spi/slave.rs
+++ b/embassy-stm32/src/spi/slave.rs
@@ -115,6 +115,22 @@ impl<'d, T: Instance> SpiSlave<'d, T> {
)
}
+ /// Create a new SPI Slave driver, in RX only mode (only MOSI, no MISO)
+ pub fn new_rxonly(
+ peri: impl Peripheral<P = T> + 'd,
+ sck: impl Peripheral<P = impl SckPin<T>> + 'd,
+ mosi: impl Peripheral<P = impl MosiPin<T>> + 'd,
+ config: Config,
+ ) -> Self
+ {
+ into_ref!(peri, sck, mosi);
+
+ sck.set_as_af(sck.af_num(), AfType::input(Pull::None));
+ mosi.set_as_af(mosi.af_num(), AfType::input(Pull::None));
+
+ Self::new_inner(peri, Some(sck.map_into()), Some(mosi.map_into()), None, None, config)
+ }
+
fn new_inner(
(my changes are here: https://github.com/Luctins/embassy/tree/stm32-spi-device)
Does this addition make sense, or are more extensive changes required?
@Luctins I've not tested this, but you may need to include the chip select pin, e.g.:
https://github.com/embassy-rs/embassy/blob/d7f51529c45c94fd18c118ae124ab23db2c088c1/embassy-stm32/src/spi/slave.rs#L106
I've experimented a bit with making the API match that of the SPI master more closely. The SPI master has the following signature
https://github.com/embassy-rs/embassy/blob/53708386a87d77db9e3fda6d234789ca79cdad41/embassy-stm32/src/spi/mod.rs#L111
In principle this is possible for the SpiSlave as well. There are two aspects to consider.
First, the async mode SPI slave has to read and write in the background with hardware support, requiring the use of ring buffers. The blocking version of the SPI slave does not use ring buffers as it does not use hardware support. Ideally the blocking SPI slave struct does not contain the ring buffer fields at all, otherwise it has unnecessary padding.
Second, the ring buffers for async mode are generic over word size, which may be desirable to expose in the API.
There are some options:
- Use
SpiSlave<'d, M: Mode>, abstracting away the word size by setting it tou8. - Use
SpiSlave<'d, M: Mode, W: Word>with the word size being meaningless for blocking mode. - Use
SpiSlaveBlocking<'d>andSpiSlaveAsync<'d, W: Word>. - Don't support blocking SPI slaves, providing
SpiSlave<'d, W: Word>with only async reading and writing. The usefulness of blocking mode is debatable: the throughput it can support is limited and there are difficulties for the programmer as the slave does not control the SPI clock.
Options 1 and 2 are possible without runtime overhead or unnecessarily padding blocking mode's struct size by adding a private trait with a generic associated type. Note it does add a second private bound on the Mode trait that shows up in the docs.
pub(crate) trait SpiSlaveModeData {
type Data<'d, W: Word>;
}
pub(crate) struct SpiSlaveAsync<'d, W: Word> {
tx_ring_buffer: WritableRingBuffer<'d, W>,
rx_ring_buffer: ReadableRingBuffer<'d, W>,
}
impl SpiSlaveModeData for Async {
type Data<'d, W: Word> = SpiSlaveAsync<'d, W>;
}
impl SpiSlaveModeData for Blocking {
type Data<'d, W: Word> = ();
}
and setting
- pub trait Mode: SealedMode {}
+ pub trait Mode: SealedMode + crate::spi::SpiSlaveModeData {}
This then allows
pub struct SpiSlave<'d, M: Mode, W: Word> {
inner: SpiSlave_<'d>,
mode_data: <M as SpiSlaveModeData>::Data<'d, W>,
}
@Luctins I've not tested this, but you may need to include the chip select pin, e.g.:
https://github.com/embassy-rs/embassy/blob/d7f51529c45c94fd18c118ae124ab23db2c088c1/embassy-stm32/src/spi/slave.rs#L106
Thanks for the suggestion.
The problem is that the intended application for this does not use the CS pin. I've tested this using ST's official drivers and it works, so my bet is misconfiguration.
UPDATE: I got it to work by messing with the configuration registers, I'll try to make a clean patch of what I did and send here.
@tomcur Can you add me as a contributor on your fork so I can add a commit with these changes?
I might be off base here, but as far as I can tell this approach might have a few issues:
- There is no handling of TX buffer underruns. Garbage data from the ring buffer gets written even if it has never been written/it goes ahead of the write position.
- I would expect an underrun to bubble up to the user code somehow?
- TX DMA is enabled immediately, causing the TX FIFO to get filled immediately upon
dma_ringbufferedbeing called.- This could be argued to be working as intended, but it definitely causes issues because of the point above.
I might be off base here, but as far as I can tell this approach might have a few issues:
There is no handling of TX buffer underruns. Garbage data from the ring buffer gets written even if it has never been written/it goes ahead of the write position.
- I would expect an underrun to bubble up to the user code somehow?
TX DMA is enabled immediately, causing the TX FIFO to get filled immediately upon
dma_ringbufferedbeing called.
- This could be argued to be working as intended, but it definitely causes issues because of the point above. @hansihe
From my tests, it seems like setting the comm field in the cfg2 register to vals::Comm::RECEIVER disables the tx part of the spi, because trying to do symmetric operations always stopped after 16 bytes (the TX buffer got full, so tx_ready never returned true).
My (a little crude) implementation is here.
@Luctins Thank you. I also need to dive into the reference manual to ensure all configuration registers are according to spec, the current version of this PR is still rough. I've added you as a contributor to the fork, but perhaps an API should be decided on before adding your changes to the PR to keep overhead of iterating on the code low.
@hansihe In the current version underrun errors bubble up on the next call to one of SpiSlaveRingBuffered::{write, write_exact, transfer_exact} but the errors are called overrun in the code (assuming my understanding of how WritableRingBuffer::{write,write_exact} bubble up the errors is correct, I should have some time soon to test the behavior).
You're right that we should allow the user to prefill the buffer before DMA is started, the DMA ring buffer allows this through WritableRingBuffer::write_immediate. Comparing with other APIs in this repo, the ring buffers provide a start method, the SAI driver also provides a start method, so I think it makes sense to mirror this and add a start method (and write_immediate method) to the SPI driver as well.
@tomcur I see your point, and you're probably right.
~~My main concern would now be supporting other spi versions, because AFAIK my changes are only compatible with spiv3/v4/v5 and I only have access to stm32h5 boards for testing.
I would suggest feature gating those versions for now (not allowing for creation of unidirectional devices) for spiv1/v2 and leaving for someone to implement it in the future if they feel inclined and have the hardware to test it.~~
EDIT: My bad, I hadn't seen your previous comment citing that you have a SPIv2 board.
@tomcur From testing on a STM32G491 that does not seem to be the case. I can double check later today to be 100% sure
Tested this on stm32f072rb and always getting "transfer error: Overrun"
Tested this on an STM32G473 and it works perfectly. Is there a timeline on merging this? Pretty important feature IMO.
Hi all. I'm very interested in moving this PR forward to get SPI slave support on STM32 chips, I agree with Levine that this is an important feature.
I see some issues with this PR (many of which have already been pointed out in this thread, but I'd like to place them in one post):
- DMA/async functions are not the defaults when
newis called, which is the opposite model from the SPI master driver. I don't think any other peripheral in the crate follows this (please correct me if I'm wrong). Additionally thedma_ringbuffered()method to return a different peripheral object is also unlike the other peripherals in the crate. I think the overhead of extra space in the struct to unify the API is worthwhile. - There are other minor API differences with the master driver (for example the read and write buffers are swapped in
transferas compared with the master driver). Another example is a lack of atransfer_in_placefunction. - As far as I can tell, it is not possible to share the DMA SPI slave peripheral between tasks. The DMA buffers in the struct are mutable slices, which cannot be safely given static lifetimes in Rust.
- There is no mechanism to enable & disable TX when writing into the DMA buffer, which makes it impossible to synchronize (or even control) writes. Lets say you need to send a message of 100 bytes to the master, if you attempt to use the current
writeorwrite_exactfunctions, the DMA will chew through and send the entire allocated write buffer.
My proposal is to change this module to match the SPI master interface as closely as possible. This would involve removing the usage of ReadableRingBuffer and calling set_txdmaen & finish_dma on every write (I need to verify that this would still allow for the DMA reads to continue uninterrupted).
I can volunteer myself to lead this effort, but I'm unsure if @tomcur and/or @Luctins are still interested in carrying this forward? I can also open a different PR if that is more convenient. I am also unsure which maintainer would be able to help us get this merged/work through the API decisions, I see that @Dirbaio has most of the git blame in the SPI master so maybe that's a good place to start?
I can test these changes on an STM32L476RG (spiv2), an STM32G071RB (spiv2), and a STM32H563ZI (spiv4). I'm also willing to purchase some more Nucleo boards if no one is available to test spivf1, spiv1 and spiv5.
I'd be happy to continue collaboration on this. The last state is that some API decisions were pending, as @Namr also pointed out.
This would involve removing the usage of
ReadableRingBufferand callingset_txdmaen&finish_dmaon every write (I need to verify that this would still allow for the DMA reads to continue uninterrupted).
I'm not sure this is going to be possible. In this mode, the board has no control over the SPI clock. It needs the hardware support to be able to read in the background.
@Namr I also would like for this to be eventually merged, but I'm currently not sure how much time I'll have to dedicate to this in the near future. I can help with testing at least (I have a STM32H563ZI/spiv4nucleo board and one of my current projects use this cpu extensively). I also support having a similar API for the master and the slave mode to ease development.
It needs the hardware support to be able to read in the background.
@tomcur Can't DMA do this? So the slave device starts 'listening' and storing bytes in a DMA (ring) buffer and processing them when possible (if data is not read for a while overruns may occur). My main point on this would be at least having a similar api without using the nb crate (and preferrably implementing embedded-io traits).
Thanks Tom and Lucas for jumping back in! I want to clarify what I meant by point number 4 on the list above. My hope was that a user would be able to do
bus.write(&buffer).await;
Which would yield the task until the master sends enough clock edges to flush out the contents of buffer. Additionally I was hoping that if no call to write was pending, the bus would always show 0 on the MISO line.
After reading further into the reference sheet (at least for spiv2), I'm realizing this behavior is not exactly encouraged by ST and may not be possible to do generically. ST seems to expect that a slave device using DMA will always have TX data written into the DMA buffer any time a clock comes in from the master. From the reference manual:
It is recommended to enable the SPI slave before the master sends the clock. If not, undesired data transmission might occur. The data register of the slave must already contain data to be sent before starting communication with the master (either on the first edge of the communication clock, or before the end of the ongoing communication if the clock signal is continuous).
On top of this, as fbeutel mentioned earlier in the thread, you won't even receive an interrupt from the DMA subsystem when your data has been written, but rather only when the buffer is half-way empty.
So the behavior I was hoping for doesn't seem to be achievable. Of course you can build this kind of API if your design/protocol has other synchronization primitives (e.g software controlled NSS, a call and response fixed-length protocol, a busy or IRQ line, etc) but I don't think any of that is appropriate for a driver.
On a separate note though, I don't see a reason why we couldn't use the TxDma trait defined on line 1319 of embassy-stm32/src/spi/mod.rs instead of the ringbuffers to make the type signatures more similar. I will give that a try along with the other API changes and report back so we can all see if its preferable.
Hi folks,
I've gotten the patch into a state I'm happy with and I think it's ready to merge back into this PR. My fork of @tomcur's impl can be found here.
This very closely matches the format (and code) of the master driver. It also offers rxonly operation and the choice between hardware and software chip select. There are quite a lot of permutations, and I have a series of 12 test programs that go through and ensure all of them function as expected: https://github.com/Namr/embassy-testbench.
Unfortunately I have only been able to test on a STM32L476RG board, but I am hoping to be able to test on other versions of the spi peripheral soon. If others would be willing to run these tests on different boards that would be really helpful! I am fairly sure the tests that are labelled read_write will not work on spi_v{3,4,5} but I need a dev board to be sure/fix it.
Any feedback would be appreciated. I am hoping this is complete enough to get this PR out of a draft state.
Hi @Namr, thank you for pushing this. It's been a little while since I looked at this code (and at Embassy). Am I right your version performs DMA while the futures (such as SpiSlave<'_, Async>::read) are active, but not in the background? I believe that means with your implementation users can only read/write fixed buffer lengths before requiring a reset of DMA. It works for the master because it is in control of the clock, but it may not be desirable here. It is of course possible I'm missing something.
@tomcur Yes, the async futures in the async spi slave struct variant must be awaited in order for DMA transfer to occur, otherwise the peripheral is turned off. This matches the master driver & STM's HAL for a slave which both also turn off the peripheral after a read/write/transfer of fixed size completes.
Unfortunately it doesn't seem that embedded-hal has a SPI slave interface, so it seems this is the wild west as far as design decisions. Perhaps I should raise an issue and/or PR there before pushing this PR forward?
My issue with the buffer-based constantly DMAing interface is the difficulty of avoiding over/underruns and synchronization. For example, this application would be quite hard to implement. That being said I see the merit for applications where you want to stream out data (e.g a sensor station that sends out readings to a master). Maybe we can have a third struct variant for this interface?
There is a little bit of prior art here: https://github.com/embassy-rs/embassy/blob/4b0e20315bddbb15aaee0e0b96548de405694c7b/embassy-stm32/src/adc/ringbuffered_v2.rs#L106
I agree it would be nicer to have the slave API more closely match the master.
My issue with the buffer-based constantly DMAing interface is the difficulty of avoiding over/underruns and synchronization.
While I agree in principle, I think this is mainly a result of the slave not being in control of the clock. I'm not sure there's a great way around that. A more synchronized API may still be desirable, but a general API probably needs to be able work at high clock speeds, which probably requires constantly-on DMA.
Most SPI devices I have used enforce fixed length messages and synchronize on chip select falling edge in order to work around these issues (e.g SD card over SPI, LoRA SX1261).
The master clock speed shouldn't matter here, as long as you're using DMA both sides should be able to move the fixed length data at the highest speed the peripheral supports. Though I do understand what you mean, synchronization mechanisms can slow the overall transfer process down, even if the clock speed is high.
I will try to sketch out the always on DMA API based on the ADC you linked and then try to write a simple application with it. I'll also take a look at other official API's (e.g raspi pico, ESP32, etc) to see how they handle SPI slaves.
I am also curious if @Dirbaio or another maintainer has strong opinions on the shape of this API?
Most SPI devices I have used enforce fixed length messages and synchronize on chip select falling edge in order to work around these issues (e.g SD card over SPI, LoRA SX1261).
The master clock speed shouldn't matter here, as long as you're using DMA both sides should be able to move the fixed length data at the highest speed the peripheral supports. Though I do understand what you mean, synchronization mechanisms can slow the overall transfer process down, even if the clock speed is high.
I will try to sketch out the always on DMA API based on the ADC you linked and then try to write a simple application with it. I'll also take a look at other official API's (e.g raspi pico, ESP32, etc) to see how they handle SPI slaves.
I am also curious if @Dirbaio or another maintainer has strong opinions on the shape of this API?
Hi @Namr! I have a question about data transfer. In my case, I need to transfer data at a high frequency. However, while the master is sending data to the slave, I noticed a significant reduction in the received frequency on the slave side. For instance, the master transmits at 200Hz, but the slave only receives data at 100-150Hz.
Additionally, I found some issues in the read/write methods. When starting a read operation, the SPE flag is set to true until finish_dma. However, during this period, the SPE flag remains false. Could this be the cause of the data loss ?
@itswenb sorry for a late response. Could you share the code, or at least the SPI snippet you're using to send & receive that data? I'd like to know which mode and features you are using in the crate, and it would be easiest to share all SPI related code you are using. I did not see this behavior in the test code here: https://github.com/Namr/embassy-testbench.
I also did not mean to abandon this PR for over a month. I was talking to @kalkyl on Matrix about the API for the SPI interface. As it turns out, there is already a SPI slave implementation in the embassy-nrf crate. That API is close to mine, but has the difference that the equivalent of the SPE register always remains true. @tomcur would matching this interface be acceptable since its already merged into the main branch?
but has the difference that the equivalent of the SPE register always remains true.
It's been a little while since I looked at stm32, but that would mean RX and/or TX DMA always being on?
@tomcur would matching this interface be acceptable since its already merged into the main branch?
Yes, I think so. My main concern with your current branch of this PR is the inability to receive (and send) unless actively calling the SPI methods.
that would mean RX and/or TX DMA always being on?
Yes, that will be the case. Great! I'll start working on moving the interface to match the nrf one & update/rerun the testbench. I'll send an update on this PR when that is complete.
@itswenb sorry for a late response. Could you share the code, or at least the SPI snippet you're using to send & receive that data? I'd like to know which mode and features you are using in the crate, and it would be easiest to share all SPI related code you are using. I did not see this behavior in the test code here: https://github.com/Namr/embassy-testbench.
I also did not mean to abandon this PR for over a month. I was talking to @kalkyl on Matrix about the API for the SPI interface. As it turns out, there is already a SPI slave implementation in the
embassy-nrfcrate. That API is close to mine, but has the difference that the equivalent of theSPEregister always remainstrue. @tomcur would matching this interface be acceptable since its already merged into the main branch?
@tomcur Sorry, my testing code has been deleted, but it was similar to the example code, with a pack length of 500 and a frequency of 200Hz. By the way, my testing chip was STM32F427RG.
In the end, I create a RingBufferRx based on your code and fixed the issue above. You can check out my implementation in this branch.