STM32CubeF4 icon indicating copy to clipboard operation
STM32CubeF4 copied to clipboard

I2C Slave Driver Race Condition

Open collinmay opened this issue 3 years ago • 4 comments

Describe the set-up

  • The board: STM32F4DISCOVERY with I2C1 connected to Digilent Analog Discovery 2 logic analyzer
  • STM32CubeIDE 1.6.1 build 9958_20210326_1446 (UTC)

Describe the bug If HAL_I2C_SlaveTxCpltCallback arranges for HAL_I2C_Slave_Transmit_IT to be called in quick succession, the driver can get confused by a TXE flag from the hardware and prematurely consume the first byte of the second transfer before the device is ever addressed for a second transfer, causing the subsequent transfer to never complete.

How To Reproduce

  1. The application collects data from sensors and presents it over the I2C bus.
  2. After collecting the first set of data, it calls HAL_I2C_Slave_Transmit_IT (size 3 bytes) and waits for the master to receive the data.
  3. Use a logic analyzer or other bus master to receive the data.
  4. When HAL_I2C_SlaveTxCpltCallback is called, a flag is set.
  5. When the main loop sees that the flag is set, it calls HAL_I2C_Slave_Transmit_IT again (size still 3 bytes) to make the next set of data available for the master to receive.
  6. Use logic analyzer to read 3 bytes.
  7. Third byte is never transmitted.

Additional context Size doesn't matter. I originally observed the bug on 28 bytes, reproduced it on 29, and later reduced it to 3 so the entire transaction would fit better on my logic analyzer display.

This bug seems to be quite timing sensitive and doesn't seem to reproduce in the presence of breakpoints or printf statements, so it's been hard for me to investigate. I added a quick buffer to store various events I thought interesting to trace that I could then read out using the debugger after the bug occurred. This is the sequence of events I observed:

  • HAL_I2C_Slave_Transmit_IT is called with Size = 3
  • Logic analyzer is used as I2C bus master to begin reading out 3 bytes
  • I2C_Slave_ADDR is called
  • I2C_SlaveTransmit_TXE is called
  • I2C_SlaveTransmit_TXE is called
  • I2C_SlaveTransmit_TXE is called
  • I2C_Slave_AF is called which calls into HAL_I2C_SlaveTxCpltCallback. The bug does not reproduce if I set a breakpoint here.
  • In my implementation, the callback actually sets a flag for HAL_I2C_Slave_Transmit_IT to be called later, outside of an ISR. Not sure whether this is relevant or not or whether the bug would reproduce with HAL_I2C_Slave_Transmit_IT called directly from the callback. I can try to test this later.
  • HAL_I2C_Slave_Transmit_IT is called with Size = 3
  • I2C_SlaveTransmit_TXE is called

(it is important to note that I only traced I2C_SlaveTransmit_TXE calls when XferCount != 0, otherwise my trace buffer would fill up)

Note that I did not use the logic analyzer to begin another transaction; the call to I2C_SlaveTransmit_TXE occurs after the second call to HAL_I2C_Slave_Transmit_IT, but before the device is actually addressed. This causes XferCount to be decremented to 2, so when I try to begin another transaction with the logic analyzer, it only receives 2 bytes and is left waiting forever for the third, while the STM enters an interrupt loop with the BTF flag set.

I'm not sure why I2C_SlaveTransmit_TXE is being called late, but I suspect that's what's causing the bug.

collinmay avatar Jun 01 '21 10:06 collinmay

I've created a smaller test project to reproduce the error. It also seems to be important that SYSCLK is increased from the default. This project has the clock configured for 168 MHz.

  1. Extract stm32cubef4_issue76.zip.
  2. Using STM32CubeIDE, go to File > Import > General > Existing Projects into Workspace and import the reproducer.
  3. Attach a logic analyzer or other I2C bus master to PB6 (SDA) and PB7 (SCL).
  4. Launch the reproducer on the STM32F4DISCOVERY board.
  5. Using the logic analyzer or other I2C bus master, read 5 bytes from address 0xf.
  6. Using the logic analyzer or other I2C bus master, attempt to read 5 bytes from address 0xf.

Expected behavior: String "hello" is sent over the bus in step 5 as well as step 6.

Actual behavior: String "hello" ({h68, h65, h6c, h6c, h6f}) is sent over the bus in step 5, image string "ello" ({h65, h6c, h6c, h6f}) is sent in step 6 and bus master is left waiting for 5th byte. image

collinmay avatar Jun 01 '21 22:06 collinmay

We also had troubles with I2C slave on F4 when using HAL_I2C_Slave_Transmit_IT. Sometimes the slave would not finish the transaction and kill the bus forever because it pulls SCL low. I suspect this could be the same issue. As a workaround we are now using a software watchdog which will force reset the i2c peripheral if it detects a lockup, a proper fix would be very much welcomed.

kjsdf7wr3 avatar Aug 04 '21 13:08 kjsdf7wr3

ST Internal Reference: 120630

ASELSTM avatar Jan 11 '22 08:01 ASELSTM

Has there been any update on this bug? I'm experiencing it at the moment and it's making HAL I2C IT impossible to use.

etheory avatar Apr 06 '22 15:04 etheory

hello! I'm having this issue as well with a board I've been developing for >1 year.

It shows that this is 'to release' (as of 8 months ago) and not 'done' on the dashboard-- did this fix make it into one of the newer STM32Cube packages for the F series? Thanks!

davidbramsay avatar Mar 07 '23 16:03 davidbramsay

hello! I'm having this issue as well with a board I've been developing for >1 year.

It shows that this is 'to release' (as of 8 months ago) and not 'done' on the dashboard-- did this fix make it into one of the newer STM32Cube packages for the F series? Thanks!

I actually found out that there is no bug, it's just that the HAL is terribly documented. Here are the things I found:

Start with the slave calling: HAL_I2C_EnableListen_IT

in HAL_I2C_ListenCpltCallback call HAL_I2C_EnableListen_IT so it keeps listening

When a master sends something, the slave will get: HAL_I2C_AddrCallback called.

In there, do: HAL_I2C_Slave_Seq_Receive_IT or HAL_I2C_Slave_Seq_Transmit_IT depending on the setting of TransferDirection, which when finished will call: HAL_I2C_SlaveRxCpltCallback Where you can either call: HAL_I2C_Slave_Seq_Receive_IT to continue receiving from the master, or: __HAL_I2C_GENERATE_NACK or just ignore it entirely.

So the main issue is just not using the correct API to do things.

Until I switched to the above calls, nothing was ever working properly. Now it's as solid as a rock. It's just that the HAL docs are absolutely terrible at explaining how they want you to do things.

etheory avatar Mar 08 '23 11:03 etheory

@etheory Thank you so so much, this comment (and this thread) absolutely saved me hours and hours of debugging!

I was having similar, difficult to track down issues with HAL_I2C_Slave_Transmit_IT killing the bus randomly, now it's looking solid. I've been pulling my hair out for several days on this one. Huge thanks!!

davidbramsay avatar Mar 09 '23 08:03 davidbramsay