openmrn
openmrn copied to clipboard
priority inversion in CAN packet receive for ESP32
Currently the CAN packet reception path for ESP32 is the following:
- pca1000 driver in IDF -- interrupt priority
- esp32 driver RX and TX message queue
- RX and TX tasks created by the Esp32HardwareCanAdapter -- TCPIP_PRIO -1 and -2 (==17 and 16)
- DeviceBuffer in the arduino/Can.hxx for RX and TX
- loopTask calling CanBridge::loop_for_read and loop_for_write -- priority 1
- a single TX buffer in the OpenMRN.h/CanBridge, and and openmrn BufferQueue for RX
- openmrn executor consumes the packets via StateFlows -- priority 1 (if using loop) or priority TCPIP_PRIO (18) if using BG task.
The result of this priority inversion is that when the CAN bus sends many back to back packets, we are piling up packets in the DeviceBuffer for RX, and a high priority task is stuffing them in while a low priority task is supposed to consume them. There is no pushback mechanism there, so we get a lot of packets dropped.
This may or may not be a symptom for the CPU having not enough juice to process the packets at full wire speed, but we need to look at the priorities of these buffers.
Steps to reproduce:
- in a computer, run the commandstation emulation and create three dozen trains.
- start a hub on the same computer, connect JMRI to the hub
- connect this hub via a USBCAN (e.g. lcc-buffer) to a wired CAN-bus
- add the ESP32 to that wired CAN-bus.
- in JMRI open the network tree and press refresh. This generates enough traffic for the ESP32 to lose about a hundred packets.
@atanisoft WDYT
This is possibly why during CDI load I see occasional timeouts being reported. Though, I thought the BG task for executor was also for the CanBridge code? If not, we do need to increase priority of the CanBridge code somehow to bring it closer to the CanAdapter code.
This is possibly why during CDI load I see occasional timeouts being reported.
I don't think so. The CDI load does not exercise the RX buffer any heavily, because the traffic is point-to-point and throttled by the target node. At any point in time there is only one packet that is in flight to the node. Also you would be seeing the serial output messages loudly complaining about buffer overflow.
Though, I thought the BG task for executor was also for the CanBridge code?
no, the CanBridge is a loop() model. The BG executor is a stateflow model (sleeping with select() when nothing to do).
If not, we do need to increase priority of the CanBridge code somehow to bring it closer to the CanAdapter code.
unfortunately that's not an option, because we cannot increase the loop task's priority. It eats all CPU if we do that.