pico-sdk icon indicating copy to clipboard operation
pico-sdk copied to clipboard

Intermittent i2c slave mode errors, interrupts fired in the wrong order

Open chrisckc opened this issue 3 years ago • 19 comments

Background:

I have been having trouble setting up reliable communications between 2 Pico boards using i2c

I started off using Earle Philhower's Arduino-Pico RP2040 core for testing this as having previously had issues with random delays using interrupts on input pins with ArduinoCore-MBED I had to abandon it for a previous use case.

I have been unable to achieve reliable i2c communications between the 2 Pico boards, the wires are short and I went from 4.7k pull-ups to 1k pull-ups with no difference. I am seeing missing data which is mostly the last 1 or 2 bytes of the transmission, very occasionally I also see a missing byte at the start of the data block. The error rate varies according to various other factors, I have seen anything from 10% to 0.06 % error rate, the most common being around 1 to 2%

I created a test harness project for this and replicated a minimal version of the circuit on a breadboard using 2 brand new Pico's. I have tried many things and so far been unable to get a zero failure rate. My scope confirms that all the data is always being sent correctly, the errors occur when the receiving Pico reports less than expected number of bytes in the onReceive interrupt.

The errors occur randomly, I have tried different i2c clock speeds, 100KHz, 400KHz default and 1MHz, each speed has an impact on the failure rate. Other factors such as if the serial port is transmitting data during i2c data reception also makes a difference. The gap between each transmission block also has an impact on the error rate, I have tried anything from zero delay to a 1000uS delay, which seems to reduce the error rate noticably.

I substituted a Teensy4.0 in place of the receiving Pico using the exact same code and managed to achieve a zero failure rate, even at 1MHz.

Everything I have tried so far is described over on the Arduino-Pico repo: https://github.com/earlephilhower/arduino-pico/issues/979

I became convinced that this is an SDK or even a hardware issue, so I modified my test code on the receiving Pico to achieve the same using the i2c slave functions in the SDK.

Having now explained how I got to this point, here is the separate issue I am logging in relation to the SDK:

I am sending the data to the receiving Pico at 10 Hz, the data consists of 2 blocks of data separated by a 1mS delay, the first block consists of 84 bytes, the second consists of 48 bytes.

To implement the i2c slave mode using the SDK functions, I found a small library which was written against the SDK, to save time in setting it up: https://github.com/vmilea/pico_i2c_slave

After doing this, I am still getting errors, now using the SDK directly so I have now effectively ruled out an issue with Arduino-Pico or the official ArduinoCore-MBED for which I was seeing these errors when using both. (although ArduinoCore-MBED had other issues)

This is my understanding off how i2c slave mode works using the SDK: Using the i2c functions in the SDK, the receive interrupt handler is fired immediately after each byte is received on the wire, with the occasional small delay observed on my scope. The Wire API as implemented in Arduino-Pico is aggregating these interrupts and firing a single interrupt (onReceive) the end of the transmission to match the intended Arduino Wire API behaviour.

The receive interrupt handler as used by the SDK provides an event variable to distinguish what the interrupt is for. Inside the receive interrupt handler there needs to be a switch case statement on this event to check if the data has been received, has been requested or has finished being sent. To debug this I setup 2 debug pins on each of these conditions so I can capture when each data byes arrives and when the transmission has finished. There is another debug pin to signal the error condition, the missing data.

The problem:

What I have found so far is that when the error condition occurs, there is an extra firing of the i2c receive interrupt immediately after the interrupt which signalled the end of the transmission. The extra interrupt firing is a "data received" event, indicating that the "end of transmission" event interrupt occurred before the interrupt signalling the reception of the last byte. This looks like an out-of-order interrupt firing which is happening sometimes.

After the error condition occurs, there is then a spurious "end of transmission" interrupt at the start of the reception of the next transmission.

For successful data reception, the "end of transmission" interrupt occurs 3.4uS after the interrupt for the last byte, just slightly longer than the i2c clock period of 2.7uS when using a 400KHz i2c clock (actually 365KHz as measured).

Scope traces for Successful data reception:

This is what a successful data reception looks like, the yellow trace is the error debug pin, the purple trace is the i2c clock, the blue trace is the interrupt firing for the event matching data reception, the green trace is the interrupt firing for the event matching the end of the transmission: ScreenImg-42 In the above scope image, the gap between each block of data is 1mS

Zoomed in further: ScreenImg-41

and further, here you can clearly see the green low pulse indicating the end of transmission occurring after the last blue pulse for the last data byte, which is correct and what should be expected: ScreenImg-40

The above looks sensible and allows data reception to be easily handled in user code.

Scope traces for failed data reception, out-of-order interrupt:

Here it can be seen the the green low pulse which is the interrupt firing for the event matching the end of the transmission is occurring before the last blue low pulse which is the interrupt firing for the last data reception event: ScreenImg-43

This is what happens for the next data transmission (after 1uS delay) after the out-of-order interrupt seen above: The cursors are measuring 9 clock pulses which is the number required to transmit 1 byte of data, or in this case the address byte: ScreenImg-44

As can be seen above, at some point during the reception of the address byte, the interrupt event for the end of data transmission is firing. (green low pulse)

Interestingly, at the end of the above transmission block, it then behaves normally: ScreenImg-45

Here is a zoomed out view of 2 sets of transmission block pairs, separated by the 1mS delay, it can be seen that when the yellow error marker occurs, there is an extra "end of transmission" interrupt firing at the start of then next transmission, after the 1mS delay: ScreenImg-38

chrisckc avatar Nov 17 '22 13:11 chrisckc