RIOT icon indicating copy to clipboard operation
RIOT copied to clipboard

periph_common/init.c: unblock i2c bus

Open DanielLockau-MLPA opened this issue 1 year ago • 19 comments

EDITED on 2024-08-13 (changed description) EDITED on 2024-08-27 (patch file upload for testing)

Contribution description

If a transaction is interrupted an external i2c device can remain in an unwanted state where it drives the bus to low and will not release it without action. This can happen whenever a MCU has a software reset but no power cycle is done on any part of the board.

This PR suggests a bus recovery for the i2c peripheral of sam0_common, during i2c_init or `_i2c_start.

The basic mechanism for unblocking the bus is to "clock" it free until the state machine of the corresponding device goes to idle and releases the bus. According to this reference (Section "2-wire software reset"), a maximum of 9 clock cycles should be able to do this. The additional start and stop conditions might not be required as other sources like this one do not mention it. As they interfere with the error condition where a device has control over SDA by pulling it low, this PR chooses only to use SDA as an input and intends to make a blocking i2c device release the bus.

Testing procedure

Reproduction was possible using a SAME54-XPRO, using tests/drivers/at24cxxx with the following changes applied: i2c_unblock_test.zip (patch file in archive due to upload restrictions)

With the test patch applied, the program will reset during a write transaction on the EEPROM. For testing the recovery during initialization has been removed by the patch and console output activated to initially report on the bus state before initialization. Recovery will still be executed after error detection in the first call of _i2c_start. For this reason the first test in the output below will fail for the first test. Activating recovery during initialization will make all tests pass at all times. Enabling more debug output will make it improbable to actually reproduce the error. This test has been tailored to one board and the current software changes.

The output after a POR can be found below. In the first iteration the bus is not blocked, in following iterations it is blocked after reset. Executing the tests requires a console input of s<Enter> to avoid an endless loop.

2024-08-13 16:41:52,171 # i2c.c: i2c bus #0 with configuration at 0x00004b74 is ok
2024-08-13 16:41:52,183 # i2c.c: i2c bus #1 with configuration at 0x00004b88 is ok
2024-08-13 16:41:52,199 # main(): This is RIOT! (Version: 2024.10-devel-66-ga1c53d-dl/riot/20240715__periph_init__unblock_i2c)
s
2024-08-13 16:42:01,383 # STARTING
2024-08-13 16:42:01,386 # Starting tests for module at24cxxx
2024-08-13 16:42:01,388 # EEPROM size: 256 byte
2024-08-13 16:42:01,389 # Page size  : 16 byte
2024-08-13 16:42:01,391 # [SUCCESS] at24cxxx_init
2024-08-13 16:42:01,396 # [at24cxxx] i2c_write_regs(): 0; polls: 6
2024-08-13 16:42:01,398 # [SUCCESS] at24cxxx_write_byte
2024-08-13 16:42:01,402 # [at24cxxx] i2c_read_regs(): 0; polls: 6
2024-08-13 16:42:01,405 # [SUCCESS] at24cxxx_read_byte
2024-08-13 16:42:01,407 # [SUCCESS] write_byte/read_byte
2024-08-13 16:42:01,411 # [at24cxxx] i2c_write_regs(): 0; polls: 6
2024-08-13 16:42:01,416 # [at24cxxx] i2c_write_regs(): 0; polls: 6
2024-08-13 16:42:01,418 # [SUCCESS] at24cxxx_write
2024-08-13 16:42:01,424 # [at24cxxx] i2c_read_regs(): 0; polls: 6
2024-08-13 16:42:01,425 # [SUCCESS] at24cxxx_read
2024-08-13 16:42:01,426 # [SUCCESS] write/read
2024-08-13 16:42:01,441 # i2c.c: i2c bus #0 with configuration at 0x00004b74 is ok
2024-08-13 16:42:01,455 # i2c.c: i2c bus hangup detected for bus #1 with configuration at 0x00004b88
2024-08-13 16:42:01,471 # main(): This is RIOT! (Version: 2024.10-devel-66-ga1c53d-dl/riot/20240715__periph_init__unblock_i2c)
s
2024-08-13 16:42:03,639 # STARTING
2024-08-13 16:42:03,642 # Starting tests for module at24cxxx
2024-08-13 16:42:03,644 # EEPROM size: 256 byte
2024-08-13 16:42:03,645 # Page size  : 16 byte
2024-08-13 16:42:03,648 # [SUCCESS] at24cxxx_init
2024-08-13 16:42:03,661 # [at24cxxx] i2c_write_regs(): -116; polls: 6
2024-08-13 16:42:03,664 # [FAILURE] at24cxxx_write_byte: -116
2024-08-13 16:42:03,668 # [at24cxxx] i2c_read_regs(): 0; polls: 6
2024-08-13 16:42:03,670 # [SUCCESS] at24cxxx_read_byte
2024-08-13 16:42:03,673 # [SUCCESS] write_byte/read_byte
2024-08-13 16:42:03,678 # [at24cxxx] i2c_write_regs(): 0; polls: 6
2024-08-13 16:42:03,682 # [at24cxxx] i2c_write_regs(): 0; polls: 6
2024-08-13 16:42:03,684 # [SUCCESS] at24cxxx_write
2024-08-13 16:42:03,688 # [at24cxxx] i2c_read_regs(): 0; polls: 6
2024-08-13 16:42:03,690 # [SUCCESS] at24cxxx_read
2024-08-13 16:42:03,692 # [SUCCESS] write/read
2024-08-13 16:42:03,705 # i2c.c: i2c bus #0 with configuration at 0x00004b74 is ok
2024-08-13 16:42:03,719 # i2c.c: i2c bus hangup detected for bus #1 with configuration at 0x00004b88
2024-08-13 16:42:03,735 # main(): This is RIOT! (Version: 2024.10-devel-66-ga1c53d-dl/riot/20240715__periph_init__unblock_i2c)
s
2024-08-13 16:42:20,735 # STARTING
2024-08-13 16:42:20,738 # Starting tests for module at24cxxx
2024-08-13 16:42:20,739 # EEPROM size: 256 byte
2024-08-13 16:42:20,741 # Page size  : 16 byte
2024-08-13 16:42:20,743 # [SUCCESS] at24cxxx_init
2024-08-13 16:42:20,757 # [at24cxxx] i2c_write_regs(): -116; polls: 6
2024-08-13 16:42:20,760 # [FAILURE] at24cxxx_write_byte: -116
2024-08-13 16:42:20,764 # [at24cxxx] i2c_read_regs(): 0; polls: 6
2024-08-13 16:42:20,766 # [SUCCESS] at24cxxx_read_byte
2024-08-13 16:42:20,769 # [SUCCESS] write_byte/read_byte
2024-08-13 16:42:20,774 # [at24cxxx] i2c_write_regs(): 0; polls: 6
2024-08-13 16:42:20,777 # [at24cxxx] i2c_write_regs(): 0; polls: 6
2024-08-13 16:42:20,780 # [SUCCESS] at24cxxx_write
2024-08-13 16:42:20,784 # [at24cxxx] i2c_read_regs(): 0; polls: 6
2024-08-13 16:42:20,786 # [SUCCESS] at24cxxx_read
2024-08-13 16:42:20,789 # [SUCCESS] write/read
2024-08-13 16:42:20,802 # i2c.c: i2c bus #0 with configuration at 0x00004b74 is ok
2024-08-13 16:42:20,816 # i2c.c: i2c bus hangup detected for bus #1 with configuration at 0x00004b88
2024-08-13 16:42:20,831 # main(): This is RIOT! (Version: 2024.10-devel-66-ga1c53d-dl/riot/20240715__periph_init__unblock_i2c)

Issues/PRs references

None.

DanielLockau-MLPA avatar Jul 15 '24 15:07 DanielLockau-MLPA

Both the already upstream ESP soft I2C bus as well as my PR for a soft I2C bus for based on GPIO LL have a recovery function to unblock the bus. They detect a locked bus at and recover it during normal operation, failing only the transfer that was affected by the issue.

IMO this is the better approach, as a bus can lock up during normal operation as well when the peripheral misses a clock cycle or the controller misses clock stretching.

IMO if a periph_i2c does not detect and recover from a locked bus, that is a bug in that driver and needs to be fixed there.

maribu avatar Jul 15 '24 16:07 maribu

Murdock results

:heavy_check_mark: PASSED

763dbccc78e670885e1b8c6c4f4da9e0d63d3a5c cpu/sam0_common/periph/i2c.c: attempt to unblock bus after error

Success Failures Total Runtime
10378 0 10379 09m:59s

Artifacts

riot-ci avatar Jul 15 '24 17:07 riot-ci

@benpicco Where should I properly document this added behavior?

DanielLockau-MLPA avatar Jul 16 '24 07:07 DanielLockau-MLPA

The whole idea can fit in a dedicated pseudomodule if we don't want to impact all MCUs.

dylad avatar Jul 16 '24 07:07 dylad

Other I2C peripherals can detect a locked bus or at least have a timeout, that allows to unblock a blocked bus when it happens.

@maribu Just to verify that we're talking about the same thing here: It is not the SAM I2C peripheral which is broken here. The state machine of the remote EEPROM is locked in a state where it pulls the bus low. But if there are some i2c peripherals for other MCUs which automatically recover from this behavior, I agree with your comment. Where would you put it then? board_init will only run after cpu_init, so periph_init is already done at that point in time.

We could add it as a pseudo-module and put it in the dependencies of selected CPUs and additionally make it available as a function if periph_i2c_reconfigure is used.

DanielLockau-MLPA avatar Jul 16 '24 07:07 DanielLockau-MLPA

The state machine of the remote EEPROM is locked in a state where it pulls the bus low

That can happen with every peripheral I2C device, if it missing a clock transition. That most likely happens when the MCU is resetting in the middle of a transfer, but also at any point in time.

As a result, the MCU controller must check for a locked bus and recover. That may look like an arbitration lost event at first, but if SDA stays low, the bus is locked.

So: Either the I2C peripheral needs to monitor SDA after an arbitration lost even, or the driver should. This cannot be done in common code.

maribu avatar Jul 16 '24 08:07 maribu

Looking at it, we never check the ERROR bit in INTFLAG - would that already tell us about the condition in _i2c_start()?

benpicco avatar Jul 16 '24 11:07 benpicco

I will rewrite this PR and include the unblocking into the driver in the sam0_common driver. As @benpicco indicates, the buserror interrupt/flag is present but currently not used.

DanielLockau-MLPA avatar Jul 16 '24 14:07 DanielLockau-MLPA

@maribu @dylad @benpicco

I brought the unlock mechanism to cpu/sam0_common/periph/i2c.c and attempt a recovery just before initializing the peripheral or on an error in _i2c_start.

Touching the i2c driver I noticed that part of the implementation was not in sync with the register description of the data sheet. In the current state of the peripheral, the implementation seemed to make sense after examining the registers on the device. I still chose to alter it in _i2c_start for more clarity. If you disagree with this, please drop me a line and I'll try to stay closer to the original implementation.

UPDATE: for testing, please note that I also altered the instructions in the description of this PR

DanielLockau-MLPA avatar Aug 13 '24 14:08 DanielLockau-MLPA

If this fixes the bug for you now, please squash!

benpicco avatar Aug 29 '24 12:08 benpicco

When back at home, I could check if I can integrate a test for a bus lock into the peripheral selftest using the Arduino shield. For periph_i2c implementations that provide periph_i2c_reconfigure I could just bit-bang some I2c to lock up the I2C GPIO extender, than return the pins to the I2C peripheral, and see if using the I2C GPIO extender will lock up or recover from the state.

maribu avatar Sep 23 '24 11:09 maribu

ping :)

benpicco avatar Oct 15 '24 12:10 benpicco

I'll try to squeeze in some time for the bus unblock test into the testing shield. The Adafruit Metro M5 Express would be a perfect target for the selftesting shield :)

maribu avatar Oct 15 '24 12:10 maribu

@benpicco I did squash.

@maribu Is your latest comment meant to block? I understood that you want to hook up a test scenario for independently verifying this PR.

DanielLockau-MLPA avatar Nov 07 '24 12:11 DanielLockau-MLPA

Exactly, that could be done just as well after merging this one.

maribu avatar Nov 07 '24 14:11 maribu

Huh did we forget about this one?

benpicco avatar May 23 '25 15:05 benpicco

Huh did we forget about this one?

I just fixed the documentation error blocking the pipeline and rebased. Hence the force push.

DanielLockau-MLPA avatar Jun 04 '25 06:06 DanielLockau-MLPA

This sadly needs a rebase to get in :/

maribu avatar Jun 04 '25 18:06 maribu

I was so free to do the rebase

maribu avatar Jun 05 '25 06:06 maribu