Arduinio: If I2C crashes, Wire library can not recover (does not try to reconnect)
If we remain with the arduino, we likely need to replace the wire library or update the code so it will automatically reconnect
Are you familiar with I2C, Wire, and writing Arduino firmware? From your description, I think you may have misconceptions about what I2C and Wire do and how they work.
Wire is the standard I2C library for Arduino, Photon, and maybe other similar platforms. It provides low level primitives for reading and writing on the I2C bus. Wire function calls return error codes about things like whether there was a timeout (according to the I2C bus timing specs) or how many bytes were read from a slave device. Wire isn't supposed to reconnect to anything or do error recovery at the protocol layer--that's not part of its job.
I2C is a serial bus specification that uses open-drain signaling. The bus doesn't have state, so it can't crash. But, it can have electrical problems like too much capacitance in the wiring, inadequately sized pull-up resistors, a device connected to the bus that isn't using I2C signaling (e.g. an Atlas sensor in TTL serial mode), or a mismatch between the bus master's clock speed and the min/max clock speeds supported by the cabling and the slave devices.
For talking to I2C slave devices with an Arduino, protocol state and error recovery are the responsibility of device specific drivers that--if you're lucky--follow the manufacturer's advice in the sensor datasheet for things like timing and how to configure registers. Typically, the sensor manufacturers provide datasheets, and, in the context of Arduino, the drivers usually come from third parties who sell the sensors on a breakout board for hobbyists. The libraries written for hobbyists tend to have minimal error handling, and they don't always follow the datasheets. The main reason I've been writing my own sensor drivers is that I want robust error handling.
Sensor specific libraries, and software from that point of the stack on up, are where you should start looking for error recovery problems. You might well have electrical problems on the I2C physical layer too. But, I doubt anything is wrong with your copy of the Wire library.
@jakerye 👆
The cause of the entire Arduino crashing when an I2C device is faulty seems to be that the library will hang when there is the wrong length reply given by a I2C device.
One approach we could take is to add in an error case for the Wire library to handle when the received message in endTransmission() is of an insufficient bit length and throw an error after a specific timeout. It does handle error cases like NACKs so it doesn't seem unreasonable to handle this case in Wire as well.
Reconnecting isn't particularly necessary but it would be good to handle this error case.
@Spaghet Interesting... perhaps I'm wrong about Wire. How are you diagnosing that the library hangs or that an I2C device gives the wrong length of reply? Do you have error messages?
This was knowledge bestowed upon me by Jake, but it makes sense to me that a sensor talking over UART on the I2C line would give inaccurate length messages. I haven't dug much further, but if the problem is a general case I2C issue, I can see it being handled in Wire instead of duplicated effort per driver.
Hmm... does this description match your understanding of how I2C communication with Wire works?
For writing to I2C devices, the normal pattern is
Wire.beginTransmission(address);
Wire.write(/* byte, string, or buffer with length */);
// optionally more calls to write
int result = Wire.endTransmission(/* boolean selecting to send stop bit or not */);
This only sends bytes. It doesn't receive them. The result from endTransmission() is 0 for success or >0 for timeouts or other errors.
To read bytes from an I2C slave device, the normal pattern is
int bytesRead = Wire.requestFrom(address, byteCount, stopBitBoolean);
if(bytesRead == byteCount) {
// It worked, but we still need to get the bytes from Wire
while(Wire.available()>0) {
byte b = Wire.read();
// do something with this byte
}
}
Wire provides a very low level API--it only knows how to read and write bytes on the I2C bus. I2C devices each have their own protocol built on reading and writing bytes.
[edit: As an analogy, Wire is like the kernel's TCP/IP stack and a sensor driver library is like a web server or a mail client. You wouldn't expect the kernel to be responsible for HTTP status codes because those are part of an application protocol at a higher level of the network stack. Metaphorically speaking, sensors all use different application protocols on top of the I2C transport protocol. There are exceptions--like the Bosch Sensortec BME280 and BMP280 pressure sensors--that work similarly, but mostly it's like SMTP vs. HTTP vs. Telnet, etc.]
@wsnook Thanks for sharing your thoughts. I haven't worked on this problem myself, just documented it from what I've heard from @gordonbrander and @jakerye. From your description, it sounds like this could be updated to "Add error handling to each sensors/actuators firmware." That way no matter what the cause it could be properly diagnosed and addressed.
Well, first of all thanks to all the input from all the people so far. It's provided some insight to me into this problem. I've started working with Wire and I2C today and I've stubbed my toes on the same issue as explained above:
If you have a master executing a requestFrom from a slave and the slave suddenly drops out during the request (due to power loss, or other failure) the master will just plainly freeze. This is a most unwelcome and undesired behaviour, because you'll not be able to recover from that problem unless you restart your whole setup.
Therefore, I think Wire should return an error along the lines already proposed by my predecessors, if a request cannot be executed and finalized orderly and completely. This way the develper doing the implementation can do the error handling and make the system to continue working "normally" without freezing.
Basically: No API-implementation should ever freeze or leave your system unstable in case some unexpected conditions are met, but at least handle the condition to that point that the problem can be dealt with in a stable and orderly manner (basic error reporting). Otherwise - I'm sorry to day - the API-implemantation is just and plainly unsusable.