RF24
RF24 copied to clipboard
RF24 library does not use interrupt and starts polling when waiting for data to be sent
Thanks for your great library and endless effort you put into maintaining this!
~~Please read about common issues first. It addresses the most common problems that people have (whether they know it or not).~~ We have a very short and concise ISR callback which sets a flag only.
Describe the bug
The library is used to send and receive datagrams to our Solar PV inverter (Hoymiles, TSUN and MBOG brands) via NRF24L01+ at 250kBps data rate. We use the IRQ output of the NRF module to trigger our receive code. During night time there is little chance that the solar inverter is able to answer our requests and so only sending of packets occurs. This is when we see for sure a lot of communication via the SPI which according to our analysis looks like constant polling of the status register 0x07 whether the send buffer is clear for the next datagram to be sent.
Please include:
-
Code to reproduce For code you can see the our issue at https://github.com/lumapu/ahoy/issues/83 (sorry mostly german) The code is under the tools/esp8266 path in a platformio project https://github.com/lumapu/ahoy/tree/main/tools/esp8266 If you need to see any specific code we can answer you in our issue there or link the relevant sections from here.
-
Expected behaviour We would assume that the IRQ is used for both sending and receiving. Apparently the Interrupt is only triggered when new messages are received from the NRF24 module. But not when we wait for the send buffer to be emptied / processed. Here constant polling by querying the status register command
0x07is used. See the following screenshot by our project lead @lumapu who traced this behaviour using his oszilloscope:
-
What device(s) are you using? Please specify make, model, and Operating System if applicable. We use nRF24L01+ modules for the 250kBps low data rate which has a higher yield to travel far enough reaching the manufacturers inverters. The high data rates of 1MBps and 2MBps are not supported by the inverters firmware. We recommend our users using LNA+PA modules with external antennase which usually work fine in PA_MIN / PA_LOW mode. Whereas the modules with circuit board antennas may require PA_MAX / PA_HIGH to send / receive at the same distance. We also recommend our users to stabilize voltage during sending on the NRF24 modules VCC / GND pins 1&2 using a electrolytic capacitor ~47..100uF. On the MCU side we use ESP8266 modules (NodeMCU v3 and Wemos D1 mini / Pro) as well as ESP32 modules.
Additional context
The problem occurs when there is a lot of sending and the library starts to poll whether the send buffer of the NRF24L01+ module has been emptied. There are different interrupts which could be enabled according to the Nordic Semiconductor data sheets which would allow the Interrupt to be used for both Sending & Receiving as far as we investigated.
If you want to use interrupts for sending, you can use the startWrite() function The normal write() function will poll until data is sent, but startWrite() will just write the packet to the FIFO buffer and return you to your code. You can then use interrupts to determine if the packet was sent succcessfully or not.
This will sound like a info dump, but I really don't know the exact cause of the problem here. I figure if I just put everything on the table, something might lead to a solution 🤷🏼♂️ .
Constant "polling" of the status register (0x07) indicates (to me) that whatHappend() is getting called constantly or the app is stuck constantly transmitting. There are few places where we actually write to the 0x07 offset. Reading the data from that register would actually look like 0x27 over MOSI. In fact, we usually get the STATUS byte from the 0x07 offset using the radio's non-op command (the 0xFF on MOSI) because we get the STATUS byte quicker that way (full duplex SPI transactions). The only time we need to write to the 0x07 offset is to reset the IRQ flags, which is done during most write methods and in whatHappened(). Since your app disabled auto-ack, its hard to tell if it is stuck transmitting or constantly calling whatHappened(). With auto-ack enabled, write() would spam the radio with non-op commands until the auto-ack was returned from the receiver or the max auto-retries count was reached.
After calling whatHappened(), the IRQ pin should reset until triggered by another event. If there is another event that triggers the IRQ immediately, then this could lead to constant polling of the 0x07 offset. However, I see your project's hmRadio.h file calls maskIRQ(true, true, false), so it is unlikely that another event is getting triggered immediately. I don't know much about disabling ISRs on the ESP8266, but I would double check the macros your project is using to manipulate the MCU (DISABLE_IRQ and RESTORE_IRQ).
problems I noticed with the code
I see your project is using the *etPayloadSize() functions (for statically sized payloads), but you have dynamic payloads enabled. This is erroneous if the received payload is not exactly 32 bytes. It would be better to use getDynamicPayloadSize() because that will tell you the actual size of the payload you're about to read from the RX FIFO. So, if the following snippet seems erroneous to me:
mNrf24.setPayloadSize(MAX_RF_PAYLOAD_SIZE); // not used for dynamic payloads
mNrf24.enableDynamicPayloads(); // payload size will be the amount of data passed to write*()
len = mNrf24.getPayloadSize(); // Does nothing over SPI; returns the int from setPayloadSize()
if(len > MAX_RF_PAYLOAD_SIZE) // ??
len = MAX_RF_PAYLOAD_SIZE; // should never get executed
I also found this comment which makes me think there's wiring/connection problems as well. It is possible that long wires can cause data to get corrupted in transit from radio to MCU (or vice versa). Furthermore,
if(!mNrf24.isChipConnected()) {
DPRINTLN(DBG_WARN, F("WARNING! your NRF24 module can't be reached, check the wiring"));
}
this warning could be detected earlier using RF24::begin() instead of RF24::isChipConnected(), just FYI.
Thanks for the responses we will follow the suggested ideas and will come back to you.
Closing, all related issues appear to be closed. Please update if further info etc needed.
As far as I followed and understood the solution now is to use startFastWrite() instead of startWrite(). The reason being that ACKs were off for some considerable time when switching between write and read mode using startWrite(). When using startFastWrite() this seems to be much quicker transitioning from write to read mode. So switching between the modes left the PA+LNA switched off which resulted in lost ACK packets and therefor led to retransmissions.
Maybe @lumapu or @tictrick can comment on the solution which was merged downstream in https://github.com/lumapu/ahoy/pull/1414
@TMRh20 & @2bndy5 thanks for your valuable insights and suggestions!
I do not know if you want to add some warning about this switching behaviour in normal mode when ACK is activated, some caveat notes to the documentation or simply make startFastWrite() the recommended default upstream ?
Re-opening issue as a reminder to put more info in the docs regarding the difference between write() functions. We have a pinned issue too, so obviously this is an issue to note better.