SX126x SetRx timing out even though timeout is set to the maximum value (0xFFFFFF)
Issue
Unable to join with esp32 swl2001 + sx1262 radio.
Equipment available
- saleae logic analyzer connected to hardware 2.
- 100MHz oscilloscope
- Digital bench power supply
Stack info
- v4.8.0 swl2001 stack
- esp-idf v5.4.1
- spi at 8MHz
- TXCO NOT being used in hw 1 or hw 2.
System configuration
- US 915 lorawan device (the device being debugged)
- Oxit Carbon 8 lorawan gateway
- AWS LNS
Hardware configurations
- MBed SX1262 dev board (SX1262MB2CAS Shield) wired to esp32 module
- esp32-s3-devkitc-1 module
- Modem stack tests pass
- SPI and GPIO connections match hardware configuration #2
- Custom hardware with SX1262 radio and pcb antenna
- esp32-s3-wroom2
- Modem stack tests pass
- 32MHz crystal used with sx1262.
- Seeed studio Wio-E5 mini Dev Board (SX1262 module) (113990939) (https://www.seeedstudio.com/LoRa-E5-mini-STM32WLE5JC-p-4869.html)
- Controlled via AT commands
- Python program written to configure and interact with the gateway.
Stack configuration
- RP2_103 defined
- REGION_US_915 defined
- JoinEUI and AppKey defined the same across hardware 1, 2, and 3.
Behavior with Semtech stack (hardware 1 and hardware 2)
Same behavior on both hardware 1 and hardware 2.
- Hardware sends join request (confirmed by seeing traffic via SDR)
- Join request received by AWS LNS (Confirmed from AWS LNS logs)
- Join response sent by AWS LNS (Confirmed from AWS LNS logs and seeing response via SDR)
- SX1262 RX1 window results in ISR with a status of (Preamble + Timeout) far before the specified SetRx timeout.
- Stack reports 'Join failed'
- Join retry timeout completes and step 1 occurs again.
- Even after dozens of join attempts, no joins are successful
Behavior with Wio-E5 module (hardware 3)
- Device sends join attempts (confirmed by seeing traffic via SDR)
- Device joins after 1 to 3 attempts. (Confirmed from AWS LNS logs).
- Send data (confirmed via SDR)
- Data confirmed received (Confirmed from AWS side and via SDR).
Questions
Is DIO2 working as expected?
Wired to reference board to evaluate.
Yellow is DIO2, teal is ANT_SW.
The above capture is during a radio TX, the rest of the time DIO2 is low.
The RF switch is a PE4259. The datasheet indicates that if Pin 6 is pulled high then table 5 (single-pin control logic truth table) should be used.
Pin 4 high is RFC to RF1. RF1 is tied to the SX1262 RFO pin. SX1262 datasheet defines RFO as RF output so the polarity of the output looks correct.
I believe this rules out the concern that the TX/RX switch isn't being switched correctly.
Is the timing of RX1 window correct? Is the system timing accurate?
- Channel 1 - GPIO output from esp32, set high and then low when sx126x_hal_write() is called with a command of 0x83 (SetTx).
- Channel 2 - GPIO output from esp32, set high and then low when sx126x_hal_write() is called with a command of 0x82 (SetRx).
- Channel 3 - DIO2 signal
The RX1 window should be opened at or before 5 seconds from the completion of the join request (falling edge of DIO2) indicated by the left hand white cursor.
Note the time delta between cursors is 4.91s, indicating that the receive is opened in advance of the RX1 window.
Is the radio sending and receiving with the correct timing?
TODO: Measurement of radio current via scope is pending arrival of an INA169 module.
Remaining info from original post
- Calling smtc_modem_set_region() for US 915 and setting keys when the SMTC_MODEM_EVENT_RESET event of the application modem event handler occurs.
- Confirmed lora sync msg register (0x0740) is being written with 0x34 0x44
- Hackrf being used to confirm device is sending join and gateway is sending join response.
- Tested with another Seeed studio dev device with the same deveui, joineui, and appkey, and regional parameters 1.0.3 to confirm the gateway can be joined in OTAA mode.
- External Saleae analyzer being used.
- Confirmed that SetRx is passed 0x82 0xFF 0xFF 0xFF (I modified sx126x.c sx126x_set_rx() to use the max value)
- Confirmed frequencies are set correctly, status is being read. SetTx called.
- External analyzer shows that ~30ms after SetRx, the radio signals an interrupt
- Stack then does a GetIrqStatus() which returns 0x02 0x04 (indicating timeout + preamble detected)
- Signal strength looks good (gateway is very strong, device is ok).
- Enabled MODEM_HAL_DBG_TRACE_RP=1 so I can see the join manager re-attempting the joins. I've checked the various timings and RX1 is 5s after the completion of the send, RX2 is 6s after it.
I've also tried with the production board and a bench board using the Semtech sx1262 mbed reference board, both behave identically so I'm pretty confident in the hardware setup.
I've tried the 15.3.1 work around without any success.
I'm pretty happy with the bsp port and the simplified application code. Analyzer isn't showing any other pins being manipulated, ie. modem reset isn't being asserted by the mcu during the SetRx -> GetIrqStatus() timeframe so it seems sure that the modem is the one generating the interrupt.
Why could the SetRx in the radio be timing out after the preamble is detected even with a very long timeout? What could I be missing here?
Regards, Chris
@lbm-team hello. Is there some way I can get support here for this issue, paid or otherwise? This feels close to working to me, I've got pretty good visibility into the behavior of the system but need help.
If the gateway and device are too close, the join accept from the gateway can be so "loud" that it overloads the input stages of the device's radio. This is a classic situation on the TTN forum which advises 10m + solid wall separation. Quick & easy to try ...
@descartes this is very helpful info. I've relocated the gateway to be about 10m away by counting steps, maybe 11m, and with a solid wall.
My setup is computer, device, hackrf in one room, gateway is 10m + solid wall in another room.
On the hackrf side I'm seeing the join request and at a lower signal level the gateway response. Still seeing preamble + timeout at a very unusual time.
Here is the trace of the join attempt tx and the irq for the preamble + timeout that appears to occur immediately after the gateway responds. Eg. the join sends, there is the 5s delay, the moment the gateway responds the radio wakes with a preamble+timeout.
*************************************
* Send Payload for stack_id = 0
*************************************
Tx LoRa at 57874 ms: freq:904700000, SF10, BW125, len 23 bytes 22 dBm, fcnt_up 1, toa = 371
RP: Task #1 enqueue with #7 priority
RP: Arbiter has been called by rp_task_enqueue and priority-task #1, timer hook #1, delay 2528, now 55346, start_time_ms 57874
RP: High priority task is in the future
E (55355) smtc_modem_hal: start_timer ms 2426, has 1, expires at 57781
E smtc_modem_hal: irq_timer_callback_wrapper()
I smtc_modem_hal: callback
I (57795) LoRaWAN: external wakeup
RP: Arbiter has been called by rp_timer_irq and priority-task #1, timer hook #1, delay 81, now 57793, start_time_ms 57874
RP: Launch task #1 and start radio state 2, type 2
RP- INFO - Radio task #1 running - Timer task #1 running - Hook ID #1 - TASK_TX_LORA - start time @57874 - priority #7
I (57885) sx: tx
E LORAWAN: radio irq
I (58275) LoRaWAN: external wakeup
RP: INFO - Radio IRQ received for hook #1
RP: IRQ 0x0002, 0x2
RP: No more active tasks
*************************************
* TX DONE
*************************************
E (58285) i2cdev: Could not read from device [0x51 at 0]: 263 (ESP_ERR_TIMEOUT)
RP: Task #1 enqueue with #1 priority
RP: Arbiter has been called by rp_task_enqueue and priority-task #1, timer hook #1, delay 4941, now 58311, start_time_ms 63252
RP: High priority task is in the future
E (58325) smtc_modem_hal: start_timer ms 4829, has 1, expires at 63154
Open RX1 for Hook Id = 1 RX1 LoRa at 63252 ms: freq:925700000, SF10, BW500, sync word = 0x34
Timer will expire in 4941 ms
E smtc_modem_hal: irq_timer_callback_wrapper()
I smtc_modem_hal: callback
I (63175) LoRaWAN: external wakeup
RP: Arbiter has been called by rp_timer_irq and priority-task #1, timer hook #1, delay 79, now 63173, start_time_ms 63252
RP: Launch task #1 and start radio state 2, type 0
RP- INFO - Radio task #1 running - Timer task #1 running - Hook ID #1 - TASK_RX_LORA - start time @63252 - priority #1
I (63255) sx: rx 3000 <-------------------------- 63255ms start receiving
E LORAWAN: radio irq
I (63285) LoRaWAN: external wakeup. <--------------------------- 63285ms - 63255ms = 30ms later the radio IRQ occurrs
RP: INFO - Radio IRQ received for hook #1
RP: IRQ 0x0018, 0x0 <------------------------------------------------------ preamble + timeout
RP: No more active tasks
*************************************
* RX1 Timeout for stack_id = 0
*************************************
RP: Task #1 enqueue with #1 priority
RP: Arbiter has been called by rp_task_enqueue and priority-task #1, timer hook #1, delay 974, now 63286, start_time_ms 64260
RP: High priority task is in the future
E (63295) smtc_modem_hal: start_timer ms 873, has 1, expires at 64168
Open RX2 for Hook Id = 1 RX2 LoRa at 64260 ms: freq:923300000, SF12, BW500, sync word = 0x34
Timer will expire in 975 ms
E smtc_modem_hal: irq_timer_callback_wrapper()
I smtc_modem_hal: callback
I (64175) LoRaWAN: external wakeup
RP: Arbiter has been called by rp_timer_irq and priority-task #1, timer hook #1, delay 87, now 64173, start_time_ms 64260
RP: Launch task #1 and start radio state 2, type 0
RP- INFO - Radio task #1 running - Timer task #1 running - Hook ID #1 - TASK_RX_LORA - start time @64260 - priority #1
I (64265) sx: rx 3000
E LORAWAN: radio irq
I (64325) LoRaWAN: external wakeup
RP: INFO - Radio IRQ received for hook #1
RP: IRQ 0x0008, 0x0
RP: No more active tasks
*************************************
* RX2 Timeout for stack_id = 0
*************************************
I (64325) lr1: delay_time 37
Start a new join sequence in 42 seconds on stack 0
I (64325) LoRaWAN: Modem event callback
W (64325) LoRaWAN: Event received: JOINFAIL
It's my understanding that if the deveui or joineui were incorrect that the gateway wouldn't be responding to the join request.
I haven't been able to decode the lorawan traffic over the air to see the content of the data but I do have another sx1262 test device (https://www.seeedstudio.com/LoRa-E5-mini-STM32WLE5JC-p-4869.html) that I'm using to confirm the gateway is working correctly and using the same devui / joineui as the device I'm trying to get working with the swl2001 stack port we wrote for the esp32.
What else could be causing this issue or that I could try here or test?
Hi, Could you please try with MODEM_HAL_DBG_TRACE_RP=0 ? In fact logs will add delays that could impact the overall functionality. Also, could you please try using the main porting test to validate your HAL ? You are probably using an 8-channel GW and the device requires to join over 64 channels (125kHz) + 8 channels (500kHz). Considering this, could you please check and confirm execution was long enough to cover all the channels ? Thanks for your feedbacks,
@opeyrard I can confirm the HAL porting tests pass (although I'm not testing them with the MODEM_HAL_DBG_TRACE_RP=1).
I will test here with MODEM_HAL_DBG_TRACE_RP=0 shortly.
Does the fact that I can see the gateway response mean that things are working as expected, or do you mean that the device is looking at one channel but the gateway is responding on another?
@opeyrard and how long is enough time to test across?
Here is the join with gateway response and the corresponding terminal output. I don't see the frequencies lining up between the stack prints and real life on the TX side.
TX here is at ~924.4MHz, response from the gateway is at 925MHz.
And here is the log output, you can see the stack is TX at 904.5MHz but in real frequency this is ~924.4MHz. Maybe this is due to stack printing the over the line frequency value vs. the real one?
*************************************
* Send Payload for stack_id = 0
*************************************
Tx LoRa at 60615 ms: freq:904500000, SF10, BW125, len 23 bytes 22 dBm, fcnt_up 1, toa = 371 <--------- 904.5MHz
RP: Task #1 enqueue with #7 priority
RP: Arbiter has been called by rp_task_enqueue and priority-task #1, timer hook #1, delay 5297, now 55318, start_time_ms 60615
RP: High priority task is in the future
E (55326) smtc_modem_hal: start_timer ms 5196, has 1, expires at 60522
E smtc_modem_hal: irq_timer_callback_wrapper()
I smtc_modem_hal: callback
I (60536) LoRaWAN: external wakeup
RP: Arbiter has been called by rp_timer_irq and priority-task #1, timer hook #1, delay 81, now 60534, start_time_ms 60615
RP: Launch task #1 and start radio state 2, type 2
RP- INFO - Radio task #1 running - Timer task #1 running - Hook ID #1 - TASK_TX_LORA - start time @60615 - priority #7
I (60626) sx: tx
E LORAWAN: radio irq
I (61016) LoRaWAN: external wakeup
RP: INFO - Radio IRQ received for hook #1
RP: IRQ 0x0002, 0x2
RP: No more active tasks
*************************************
* TX DONE
*************************************
RP: Task #1 enqueue with #1 priority
RP: Arbiter has been called by rp_task_enqueue and priority-task #1, timer hook #1, delay 4928, now 61065, start_time_ms 65993
RP: High priority task is in the future
E (61066) smtc_modem_hal: start_timer ms 4827, has 1, expires at 65893
Open RX1 for Hook Id = 1 RX1 LoRa at 65993 ms: freq:925100000, SF10, BW500, sync word = 0x34
Timer will expire in 4929 ms
E smtc_modem_hal: irq_timer_callback_wrapper()
I smtc_modem_hal: callback
I (65916) LoRaWAN: external wakeup
RP: Arbiter has been called by rp_timer_irq and priority-task #1, timer hook #1, delay 79, now 65914, start_time_ms 65993
RP: Launch task #1 and start radio state 2, type 0
RP- INFO - Radio task #1 running - Timer task #1 running - Hook ID #1 - TASK_RX_LORA - start time @65993 - priority #1
I (65996) sx: rx 3000
E LORAWAN: radio irq
I (66026) LoRaWAN: external wakeup
RP: INFO - Radio IRQ received for hook #1
RP: IRQ 0x0018, 0x0 <-------------------------------- preamble + timeout (but in the hackrf trace you can see what appears to be a perfectly good response from the gateway)
RP: No more active tasks
*************************************
* RX1 Timeout for stack_id = 0
*************************************
RP: Task #1 enqueue with #1 priority
RP: Arbiter has been called by rp_task_enqueue and priority-task #1, timer hook #1, delay 974, now 66027, start_time_ms 67001
RP: High priority task is in the future
E (66026) smtc_modem_hal: start_timer ms 873, has 1, expires at 66899
Open RX2 for Hook Id = 1 RX2 LoRa at 67001 ms: freq:923300000, SF12, BW500, sync word = 0x34
Timer will expire in 975 ms
E smtc_modem_hal: irq_timer_callback_wrapper()
I smtc_modem_hal: callback
I (66916) LoRaWAN: external wakeup
RP: Arbiter has been called by rp_timer_irq and priority-task #1, timer hook #1, delay 87, now 66914, start_time_ms 67001
RP: Launch task #1 and start radio state 2, type 0
RP- INFO - Radio task #1 running - Timer task #1 running - Hook ID #1 - TASK_RX_LORA - start time @67001 - priority #1
I (67006) sx: rx 3000
E LORAWAN: radio irq
I (67066) LoRaWAN: external wakeup
RP: INFO - Radio IRQ received for hook #1
RP: IRQ 0x0008, 0x0
RP: No more active tasks
*************************************
* RX2 Timeout for stack_id = 0
*************************************
I (67066) lr1: delay_time 37
Start a new join sequence in 37 seconds on stack 0
I (67066) LoRaWAN: Modem event callback
W (67066) LoRaWAN: Event received: JOINFAIL
@opeyrard @descartes above is the hackrf capture along with the log and the odd frequency discrepancy between the trace.
@opeyrard how long should I test here to join with the debug prints disabled before I can conclude it isn't working correctly?
I can look at the detail later on, you may want to do a compare & contrast of the Wio-E5 module.
I spend more time than I care to consider dumping the excrutiating detail of what's going on under the hood of LW devices to a serial port and rarely jeopardise the timings - but to be sure about this, on ESP32 native USB you can run the serial at 921600.
The Regional Parameters will tell you the detail of what channel you should expect a reply on (offset modulo 8 of the tx channel) but fundamentally, if the Wio-E5 is working OK and you've got a corresponding channel mask setup, you shouldn't need to try all 64 channels, porting is about creating a HAL, if that allows the code to talk to the radio OK to manage a JR you'd like to think that it would allow it to perform an Rx.
So in that respect, checking the radio module is having it's antenna switches set correctly would be a good move.
I read that you've got the Semtech hardware. Are you using your own SX1262 implementation or is it a radio module, if so, which one?
And as another way of verifying the hardware, try RadioLib as the LW dev is an ESP32 freak so that's a very well exercised platform so should be a matter of setting the right pin map as long as you can stand the Arduino environment - we use PlatformIO + VSCode so it's pretty productive to get going but you can just as easily use the Arduino IDE.
@descartes I have two hardware setups here, they behave the same:
- MBed SX1262 dev board (SX1262MB2CAS Shield) wired to esp32 module (modem stack tests pass here)
- Custom hardware with SX1262 radio and pcb antenna (copy of existing working design) (modem stack tests also pass here)
I think the Mbed test setup would rule out the physical layout issues and wiring issues, especially as the modem stack tests pass. Doesn't the sx1262 manage the rx/tx switch internally or could the stack somehow not be configuring the rx/tx switch appropriately? I thought that not enabling the RX path would certainly cause an issue.
Should I dig in further to the wiring portion if the mbed setup also isn't working or does this rule that out in your view?
Agree on the timings and debug output, I've been watching those carefully and the timestamps and still plan to test without debugging enabled in the stack.
Doesn't the sx1262 manage the rx/tx switch internally or could the stack somehow not be configuring the rx/tx switch appropriately?
I haven't got as far as spinning my own radio, but in planning to do so I see lots of ways of switching the RF path and I am aware of a number of off-the-shelf offerings - including the STM32WL series - that need explicit code support to set the appropriate pins - some of which, I believe, are done via register setup in the SX126x
@descartes I can confirm at power on that a SetDIO2AsRfSwitchCtrl ON command is being sent to the radio. I can also confirm on the mbed board and our board that DIO2 is wired to the RF switch.
This did come to mind though, what if RX isn't really enabling the hardware to receive? I could put a probe on DIO2 to confirm this case on the mbed board, on our board its under a metal cover that I'd have to remove.
on our board its under a metal cover that I'd have to remove.
Sometimes small sacrafices have to be made - it's mostly the US that has metal covers, the rest of us run without, so removing it shouldn't compromise anything. But sound easier to tap in to the Semtech board.
Hi,
Could you please check power profile of the radio to check if there are 5s between the end of the Tx and the beginning of the Rx ? Could you please also try to extend crystal error to 16000 for instance using command "smtc_modem_set_crystal_error_ppm( uint32_t crystal_error_ppm )" ? This is to increase window size to open a little bit more before the downlink of the gateway. You may have to finetune the value to optimize the radio power consumption. Also the SetRx parameter change must be reverted. Could you please try and let us know ? Thank you very much, Best regards,
@opeyrard I've reverted the SetRx change, adjusted the compile time definition for the crystal error vs. the run-time call.
In lr1mac_config.h
#endif
+#define BSP_CRYSTAL_ERROR 16000
// Crystal error of the MCU to fine adjust the rx window for lorawan ( ex: set 30 for a crystal error = 0.3%)
No change in joining behavior, still seeing 0x18 radio status indicating the rx timeout.
I'm working to measure the power consumption at run time so we can confirm the radio is properly tx and rx.
Hello @cmorganBE, do you have any updates on the issue? I'm also using the sx1262 and I'm facing the same problem, the radio send a timeout IRQ at 16 ms after it is put in RX mode. I leave a capture of the logic analyzer.
@leonardomunoz90 what architecture are you seeing this with? What stack version, what radio (is it a reference board or a custom design?
The only updates I have are the stuff at the top I've rewritten to clarify the problem and include the most current information. Otherwise still seeing the issue which looks similar to yours, the radio itself times out after that time and signals an interrupt.
Have you enabled the radio traces, I think MODEM_HAL_DBG_TRACE_RP=1, is the one that will show the IRQ values read back from the radio.
@chmorgan — sorry for the late reply. After several hours of debugging I managed to get it working. Details below.
Hardware and setup
I'm using a custom board with an E22-900MM22S module (internally it uses an SX1262).
The device is Class A and the join request is received by my local ChirpStack server.
Observed problem
During the join procedure the gateway/server sends a Join Accept, but the radio does not receive it because of an early RX timeout — I measured ~16 ms for the RX1 window and ~49 ms for the RX2 window. (see RX1 / RX2 analyzer captures below)
I had configured a 3 s RX timeout via SPI commands. The value I set was 0x02EE00, which equals 192000 steps of 15.625 µs (≈ 3 s). But this doesn't seem to be effective. (SPI command / config capture shown below)
Workaround and result
The issue is removed when I send the command SetLoRaSymbNumTimeout with value 0. After that change the early timeout disappears and the 3 s RX windows are visible on the logic analyzer. (see images showing command and analyzer trace)
Remaining issue
Even though the RX windows are now correct, the device does not connect immediately to the LoRaWAN network. It retries every minute and only successfully joins after about 30 minutes of retries. (retry pattern trace shown below)
Hope this helps!
@leonardomunoz90 thanks for the workaround.
I had the exact same problem. Sometimes the device was able to join the network depending on the timing of SPI commands to configure RX (some delay or serial print to debug made it work sometimes). Even if the timing was absolute right for join to happen, following confirmed uplinks failed to receive ack due to the same problem (premature timeout). In my case, the response of SX126X_GET_IRQ_STATUS was Read CMD 0x12: data_len=2 data=0x02 0x00 , what indicates that the IRQ was triggered due to RAL_IRQ_CAD_OK, but it doesn't make sense in RX context.