imxrt-hal Implementations of blocking embedded_hal traits should never return "Busy"

Implementations of blocking embedded_hal traits should never return Busy. That's a pretty bold claim to make, considering documentation for these traits is virtually non-existent in embedded_hal 0.2, so let me elaborate: Consumer crates of these traits generally assume that their associated functions can be called back-to-back. I.e. either

Functions block until an operation has finished
If a function can't execute, because an operation is in progress, the function blocks until it can execute

Since the error type is opaque to consumer crates, they can't tell if an error is critical or just needs a retry.

The 1.0 release of the embedded_hal crate is going to add documentation to that extent to the spi module. For the I2C case this is still pending.

May 19 '23 23:05 Florob

It's a correct claim. Signaling a driver-specific busy error through the blocking interface is an oversight. I see this affects LPSPI (example) and LPI2C (example); let me know if you've seen others.

I prefer approach 2 when possible. It should be straightforward to change today's "if busy, return error" behaviors to "while busy, wait for not busy."

May 21 '23 22:05 mciantyre

It's a correct claim. Signaling a driver-specific busy error through the blocking interface is an oversight. I see this affects LPSPI (example) and LPI2C (example); let me know if you've seen others.

These are the only ones I've seen.

I prefer approach 2 when possible. It should be straightforward to change today's "if busy, return error" behaviors to "while busy, wait for not busy."

That is what I currently have locally to get LPSPI working. I was wondering if one could do even better, since the code already waits for available FIFO space. That way new operations could be enqueued even if the peripheral is busy. However, I don't really understand how different functions interact sufficiently to tell whether that's possible. E.g. I suppose there must be some reason that operations currently start with clearing the FIFOs.

May 21 '23 23:05 Florob

Clearing the FIFO, particularly the RX FIFO, is important for error recovery. Suppose we're using the LPSPI to transfer (send and receive) data. While we're filling up the TX FIFO in the exchange loop, we observe a transmit error[^1]. Although we've exchanged some data, we return early with an error, leaving received data in the RX FIFO. If we didn't clear the RX FIFO, the next exchange call would return data from the previous, abandoned exchange call.

The implementation clears the TX FIFO just in case the user had been manually issuing commands / data with enqueue_transaction / enqueue_data. (The "if busy, return error" behavior should mean this is moot.) Considerations for the RX FIFO come into play here, too; if the user hasn't called read_data to empty the RX FIFO after those enqueue_* calls, that data remains in the FIFO when they use a higher-level transfer call.

There was an attempt to make the lower-level I/O interface (like enqueue_transaction, enqueue_data, read_data) play well with the embedded-hal interface, so that they could co-exist on the same type. We're free to change that, either by design or documentation. We might need that to seamlessly realize approach 2.

If we only support the embedded-hal interface, we should be able to wait for space in the TX FIFO without checking for busy. That's the happy path. If there's an error, maybe clearing the FIFOs before returning the error would be sufficient? I'll think about this some more...

[^1]: LPSPI transmit errors occur when the TX FIFO underruns. Let's pretend an underrun occurs because an urgent, CPU-bound interrupt takes over just before we put more data in the FIFO.

May 22 '23 20:05 mciantyre

embedded-hal 1.0 is going to require variant 1 for I2C, but either variant for SPI. Relevant text quoted below.

I2C:

# Flushing

Implementations must flush the transfer, ensuring the bus has returned to an idle state before returning.
No pipelining is allowed. Users must be able to shut down the I2C peripheral immediately after a transfer
returns, without any risk of e.g. cutting short a stop condition.

(Implementations must wait until the last ACK bit to report it as an error anyway. Therefore pipelining would only
yield very small time savings, not worth the complexity)

SPI:

To improve performance, [`SpiBus`] implementations are allowed to return before the operation is finished, i.e. when the bus is still not
idle. This allows pipelining SPI transfers with CPU work.

When calling another method when a previous operation is still in progress, implementations can either wait for the previous operation
to finish, or enqueue the new one, but they must not return a "busy" error. Users must be able to do multiple method calls in a row
and have them executed "as if" they were done sequentially, without having to check for "busy" errors.

Jan 03 '24 18:01 Florob

The implementation clears the TX FIFO just in case the user had been manually issuing commands / data with enqueue_transaction / enqueue_data. (The "if busy, return error" behavior should mean this is moot.) Considerations for the RX FIFO come into play here, too; if the user hasn't called read_data to empty the RX FIFO after those enqueue_* calls, that data remains in the FIFO when they use a higher-level transfer call.

FWIW this behaviour in a type that implements the embedded_hal blocking traits is quite unexpected and just cost me quite a bit of time in debugging. I do greatly appreciate the more granular/powerful interface above the embedded_hal basics that imxrt-hal provides, but IMO the basic trait implementations shouldn't have surprises like this.

Apr 25 '24 22:04 djmdjm

I'm sorry for the trouble. Could you share some pseudocode or a call sequence that demonstrates what you were debugging? I'd like to learn more about what went wrong.

I predominately use those granular interfaces in my projects. Through this thread and others, I accept that I'm not giving the embedded-hal traits the attention that folks need.

Apr 26 '24 15:04 mciantyre

I was just doing a bunch of back-to-back spi.write(), something like:

loop {
  let mut d: [u8; 4] = [  0x12, 0x34, 0x56, 0x78 ];
  spi.write(&mut d).ok();
  let mut d: [u8; 4] = [  0x12, 0xde, 0xad, 0x00 ];
  spi.write(&mut d).ok();
}

only the 0x12 word was making it to the bus because the clear_fifos() call in write_no_read() would cancel the in-progress transfer after only a single word was sent. It was extra confusing for me because both attempted transfers had the same preamble...

Apr 27 '24 05:04 djmdjm

Thanks for sharing. I see that, no matter the busy indication, we don't need to clear the FIFOs.

#157 is my attempt at fixing the busy error for the LPSPI driver. The hardware example includes a test for overlapped writes. If folks have time to test that change, it should be backwards compatible. And if anyone wants to contribute their simpler patches to address this issue, I'm happy to prefer those.

Apr 30 '24 18:04 mciantyre

#157 and #165 are both in the 0.5.6 release. I have some user feedback that suggests #157's blocking behavior is helpful for synchronizing external components. #165 follows the embedded-hal 1.0 guidance to block at the end of transactions.

Jul 05 '24 13:07 mciantyre

imxrt-hal imxrt-hal copied to clipboard

Implementations of blocking embedded_hal traits should never return "Busy"

imxrt-hal
imxrt-hal copied to clipboard