RIOT icon indicating copy to clipboard operation
RIOT copied to clipboard

SX126x HAL (restored)

Open ansocket opened this issue 3 years ago • 24 comments

Contribution description

It's a HAL implementation for Semtech SX126x driver. Basically created for RAK3172 module (STM32WLE5CC).

Testing procedure

gnrc_networking: All basic functions work correctly.

tests/ieee802154_submac: All should work correctly, if SF<=7 and BW>=125KHz, because ack timer can't wait more than ~65ms.

tests/ieee802154_hal: Run and you should see:

  • Correct ACK replies after transmission
  • Correct address filter
  • CCA
ieee802154_hal transmission
main(): This is RIOT! (Version: 1fd33c-lora_rak3172)
Trying to register sx126x.
Success
Initialization successful - starting the shell now
> print_addr

A6:DE:12:64:DA:C5:B4:2B

> Size = 31
Frame received:
00000000  61  DC  00  23  00  2B  B4  C5  DA  64  12  DE  A6  05  84  4E  a..#.+...d.....N
00000010  08  50  68  EE  DE  4C  6F  72  65  6D  20  69  70  73  75      .Ph..Lorem ipsu
LQI: 255, RSSI: 200

Size = 31
Frame received:
00000000  61  DC  01  23  00  2B  B4  C5  DA  64  12  DE  A6  05  84  4E  a..#.+...d.....N
00000010  08  50  68  EE  DE  4C  6F  72  65  6D  20  69  70  73  75      .Ph..Lorem ipsu
LQI: 255, RSSI: 197
main(): This is RIOT! (Version: 1fd33c-lora_rak3172)
Trying to register sx126x.
Success
Initialization successful - starting the shell now
> txtsnd A6:DE:12:64:DA:C5:B4:2B 10 ackreq

Transmission succeeded

> Size = 3
Received valid ACK with sqn 0
00000000  02  00  00                                                      
LQI: 255, RSSI: 194

txtsnd A6:DE:12:64:DA:C5:B4:2B 10 ackreq

Transmission succeeded


> Size = 3
Received valid ACK with sqn 1
00000000  02  00  01                                                      
LQI: 255, RSSI: 194

Problems

Issues/PRs references

#19198

UDP: I accidently wiped the repo, so I create a new branch. Code is not changed.

ansocket avatar Jan 19 '23 04:01 ansocket

BTW we will need to find a way to allow both interfaces (netdev, Radio HAL) simultaneously. Otherwise it's not possible to use LoRaWAN with this radio.

jia200x avatar Jan 19 '23 09:01 jia200x

@jia200x Wow, I didn't expect that your review will be so detailed. Thank you very much! I'll check all your suggestions and make changes as soon as possible :)

ansocket avatar Jan 19 '23 09:01 ansocket

Hello!

I made some changes into this implementation:

  • ACK correctly worked on my rak3172 board with ieee802154_submac or gnrc_networking examples. This mechanism reduced losses to 0%!
  • Added ztimer instead hardware timer. So @jdavid I hope you can run examples on your board successfully.
  • Some global variables deleted or moved in sx126x struct.
  • I changed ACK_TIMEOUT in submac.c to 50000us. It helps to receive every ACK. Probably, this value can be less.

P.S. Thank you @jia200x for your help!

ansocket avatar Jan 24 '23 09:01 ansocket

Great to hear!! Thanks to you for the contribution.

I changed ACK_TIMEOUT in submac.c to 50000us. It helps to receive every ACK. Probably, this value can be less.

So far this has been hardcoded because there SubMAC only worked for O-QPSK radios. But now that there are more alternatives, we should probably calculate it on demand. Otherwise, just changing this value will break existing O-QPSK radios.

We can probably get it from CAPS for now.

jia200x avatar Jan 24 '23 10:01 jia200x

btw could you rebase this branch on top of lora_24012023 to bring the new commits here?

git checkout lora_rak3172
git rebase lora_24012023
git push -f

jia200x avatar Jan 24 '23 10:01 jia200x

Hello @jia200x

Thanks for your review! All changes accepted, I will work on it :)

ansocket avatar Jan 25 '23 11:01 ansocket

hi @3JIouHoCoK

any news on this one? It would be awesome to have support for this one!

jia200x avatar Feb 28 '23 09:02 jia200x

btw this works quite well! I isolated the sx126x files into a separate commit and it ran out of the box! we could do the same to get it in quick. Then we can focus on getting the board.

jia200x avatar Feb 28 '23 10:02 jia200x

Hello @jia200x

My superiors made a decision to move on sx1278 radio, so now i'm busy moving this implementation to the new custom board... But I don't forget about sx126x because I need the same features in the new board:

  • HAL and netdev radios should work together;
  • FSK modem should be provided;

So, I think It's not an impossible task, but I need some extra time :)

ansocket avatar Feb 28 '23 10:02 ansocket

Alright! I think FSK could be provided in a follow-up, so I wouldn't consider it urgent.

My superiors made a decision to move on sx1278 radio, so now i'm busy moving this implementation to the new custom board... But I don't forget about sx126x because I need the same features in the new board:

In case you cannot spend more time on this PR (and if you are fine with it) I offer you to take-over this PR from my side, as I would also benefit from this feature. Of course keeping your authorship in both commits and files. Let me know what you think.

jia200x avatar Feb 28 '23 11:02 jia200x

It will be great for our community :) I'll push some current changes in this branch. Should I make something to give you write/publish access , @jia200x , in this PR? Or you will just copy the branch?

ansocket avatar Feb 28 '23 15:02 ansocket

Hi @jia200x

I upload some new changes which contain:

  • netdev is back! Now if ieee802154 module is used then HAL is used too. Else lorawan/raw netif device will be created like it's been before.

  • ack timeout calculate by #19198 but it works when SF<=7 and BW>=125. Probably 65ms is too low for these params.

  • remove all global vars except nessessary (I hope :)

ansocket avatar Mar 02 '23 09:03 ansocket

hi @3JIouHoCoK ,

thank you so much for the last commits. I was off for some weeks due to paper work, but in the meantime I could test the radio and it works quite fine. I only found small issues (that seem to be also present in the netdev version, so it's not related to this PR). I also found some stuff that we should probably change before merging (e.g sending the transceiver to COLD_SLEEP erases RAM and therefore the transceiver losses its state).

I will try to come back during the week.

jia200x avatar Apr 04 '23 08:04 jia200x

Hello @jia200x Thanks for your review! I'll make changes in the next few days.

ansocket avatar Apr 20 '23 08:04 ansocket

Hello @jia200x I made some new changes as you wrote at your review. But one old problem disturbs me a little. I think, my implementation of ACK replies needs tests. Sometimes I catch a problem when I send "ping" packets. Console sends me:

gnrc_netif: can't queue packet for sending.

I think, multiple threads try to use the driver when it sends a reply. Maybe I have an issue in interrupt context when ACK is received (or not). Could you (or someone else, who is using this driver ;) ) check it, please?

ansocket avatar Apr 25 '23 09:04 ansocket

Hello @jia200x ! I closed this PR accidently again! I didn't know that renaming a branch can close a pr :( But as I understand correctly, I should make a pr with only sx126x driver without any board files. Should I open a new PR or change something in this?

ansocket avatar Apr 26 '23 03:04 ansocket

Hello,

I think, my implementation of ACK replies needs tests. Sometimes I catch a problem when I send "ping" packets. Console sends me:

gnrc_netif: can't queue packet for sending.

Could you please send around the ping parameters? This usually occurs when the stack transmit faster than the transceiver, which in case of LoRa is likely. Note that LoRa transceivers are orders of magnitude slower than standard IEEE 802.15.4 transceivers. Sending 8 bytes takes around 60 ms with SF7BW125. The same packet in a IEEE 802.15.4 O-QPSK transceiver would probably take less than 1 ms. When adding ACK logic on top, these times on air add up considerably.

What is important to check is whether there are pktbuf leaks or not. You can use the pktbuf command for that (add USEMODULE += shell_cmd_gnrc_pktbuf to your application Makefile).

I think, multiple threads try to use the driver when it sends a reply. Maybe I have an issue in interrupt context when ACK is received (or not).

I will give it a look.

jia200x avatar Apr 26 '23 10:04 jia200x

But as I understand correctly, I should make a pr with only sx126x driver without any board files. Should I open a new PR or change something in this?

Hmmm it might be easier to open a new PR and use this one as a reference. You should just copy the SX126X related files + the tests (hal and submac)

jia200x avatar Apr 26 '23 11:04 jia200x

Could you please send around the ping parameters?

Default 1s ping . The issue shows randomly (mb). So I could get this after 5-7 packets and didn't after 10000. SF7BW125 ACK on.

ansocket avatar Apr 26 '23 15:04 ansocket

Default 1s ping . The issue shows randomly (mb). So I could get this after 5-7 packets and didn't after 10000. SF7BW125 ACK on.

What does the output of pktbuf show? Does the node recover after some time?

jia200x avatar Apr 27 '23 09:04 jia200x

What does the output of pktbuf show? Does the node recover after some time?

I'll try to check it tomorrow.

ansocket avatar Apr 27 '23 10:04 ansocket

Hello, @jia200x

What does the output of pktbuf show

pktbuf after the problem: PING ACK pktbuf.txt

Does the node recover after some time?

I had waited about 15 minutes and the board didn't recover. But the board still have RF switch staying in TX state.

ansocket avatar Apr 28 '23 08:04 ansocket

hmmmm ok, there's definitely something wrong... I suspect there could be an issue with the IRQ processing, as this usually happen when the SubMAC is not informed about a TX Done event. I will give it a try during next week.

jia200x avatar Apr 28 '23 10:04 jia200x

I'm not sure but current ack realization by ztimer doesn't look safe :) I think ack reply should run in the same thread with other sx126x stuff. If we are starting to make an ack reply, but at this moment ack timeout is fired, what should be happened? I should think about similar situations.

Today I tested TCP connection by Sock API without ACK replies by sending a simple string with 2s delay (gnrc_border_router connected to the host machine). After a couple of hours TCP connection was broken and global address (given by dpcp6) was deprecated. And I'm not sure what happened. Upd. border router just stop sending any messages to ethos after some time. Probably this issue described in #16398, so it doesn't depend on this driver :)

ansocket avatar Apr 28 '23 11:04 ansocket