twai: `TWAI_ALERT_TX_FAILED` long before bus off mode is reached (IDFGH-13273)
Answers checklist.
- [X] I have read the documentation ESP-IDF Programming Guide and the issue is not addressed there.
- [X] I have updated my IDF branch (master or release) to the latest version and checked that the issue is present there.
- [X] I have searched the issue tracker for a similar issue and not found a similar issue.
IDF version.
v5.2.1
Espressif SoC revision.
ESP32-S3 (QFN56) (revision v0.2)
Operating System used.
Linux
How did you build your project?
Command line with idf.py
If you are using Windows, please specify command line type.
None
Development Kit.
ESP32-S3-DevKitC-1
Power Supply used.
USB
What is the expected behavior?
I'm using twai_transmit with the TX queue disabled. When twai_transmit returns ESP_OK, to my understanding, the twai controller should keep trying retransmissions until success or bus off error (Transmission error counter > 255). Once transmission is successful, the TWAI_ALERT_TX_SUCCESS alert is received.
What is the actual behavior?
In most cases, I get the TWAI_ALERT_TX_SUCCESS alert, indicating the CAN frame was transmitted.
In some cases however, I get TWAI_ALERT_TX_FAILED long before bus off error state is reached.
This doesn't make sense to me, as I expect the twai controller to automatically re-attempt sending the message until bus off error is reached. Is there another reason why a message would be dropped by the twai controller?
Note: The single-shot flag on my CAN frame is set to false. Re-transmissions do seem to work most of the times, because if I disconnect the cable and reconnect, the pending frame is correctly sent.
Steps to reproduce.
- Enable
CONFIG_TWAI_ISR_IN_IRAM=y(The issue doesn't seem to occur when this is set to false, but I need it or else I sometimes miss RX packets). - Disable the TX queue -->
twai_transmitreturningESP_OKmeans the transmission was successfully initiated twai_transmittwai_read_alertsto wait forTWAI_ALERT_TX_FAILEDorTWAI_ALERT_TX_SUCCESS- repeat previous 2 steps
It only happens during quite high bi-directional throughout. In my case: 250 CAN frames per second in both directions. Bus speed set to 125kb/s.
My twai_message_t:
msg->flags = 0;
msg->identifier = 0x523
msg->data_length_code = 8
My driver config:
const twai_timing_config_t t_config = TWAI_TIMING_CONFIG_125KBITS();
const twai_filter_config_t f_config = TWAI_FILTER_CONFIG_ACCEPT_ALL();
twai_general_config_t g_config = TWAI_GENERAL_CONFIG_DEFAULT(
CONFIG_CAN_TX_GPIO_NUM, CONFIG_CAN_RX_GPIO_NUM, TWAI_MODE_NORMAL);
g_config.rx_queue_len = 100;
g_config.tx_queue_len = 0;
g_config.intr_flags = ESP_INTR_FLAG_LEVEL1 | ESP_INTR_FLAG_IRAM; // even without this IRAM flag, the issue occurs
I found out this error also occurs when above IRAM interrupt flag is not added. Just enabling IRAM optimisations alone (without allowing the ISR to actually make full use of them) causes the issue. Weird.
Debug Logs.
No response
More Information.
No response
@robin96c Can you describe more how long between you receive TX_FAIL and BUS_OFF.
Then at first, you receive BUS_OFF means may some noise or connection issues on your hardware, the twai controller don't drop any tx message except BUS_OFF, it is really contiuing re-trans one message, if it just simplly don't receive ACK, controller don't enter BUS_OFF, at least one of another error is here lead BUS_OFF.
Can you check and enhance your connection, here I simply both trans between 2 nodes can't reproduct this issue.
@wanckl
That's the thing, the controller doesn't reach BUS_OFF state before or after the TX failure. I get randomly dropped frames (alerted by TWAI_ALERT_TX_FAILED), but frames sent immediately after the failure get sent correctly (TWAI_ALERT_TX_SUCCESS).
I recreated the error by editing the twai_alert_and_recovery example. I flashed it to two ESP32S3 devices and let them send data in both directions (though you can also use just one ESP and a CAN2USB device with a python script for example). It can take some time before the error is triggered. In the attached logs it took 17 minutes.
The bus load during my tests was around 45%, with 2 nodes connected and data being sent in both directions.
Sidenote: in the logs you will see a bus off failure of device 1 shortly after bootup. This was due to the fact that device 2 was booting at that time, and is unrelated to the TX failure that occurred 17 minutes later.
Project + full device logs: twai_tx_fail_error.zip
I (988401) TWAI_TEST: sent 246000 messages
I (988412) TWAI_TEST: received 246000 messages
I (992401) TWAI_TEST: sent 247000 messages
I (992412) TWAI_TEST: received 247000 messages
E (992764) TWAI_TEST: TWAI_ALERT_TX_FAILED
E (992764) TWAI_TEST: tx failed [ESP_FAIL]
I (996404) TWAI_TEST: sent 248000 messages
I (996413) TWAI_TEST: received 248000 messages
I (1000404) TWAI_TEST: sent 249000 messages
I (1000413) TWAI_TEST: received 249000 messages
I (1004404) TWAI_TEST: sent 250000 messages
I (1004413) TWAI_TEST: received 250000 messages
I (1008404) TWAI_TEST: sent 251000 messages
I (1008413) TWAI_TEST: received 251000 messages
I (1012404) TWAI_TEST: sent 252000 messages