firmware icon indicating copy to clipboard operation
firmware copied to clipboard

[Bug]: NRF52840 - Random crash on packet processing

Open FFAMax opened this issue 3 months ago • 6 comments

Category

Hardware Compatibility

Hardware

Other

Is this bug report about any UI component firmware like InkHUD or Meshtatic UI (MUI)?

  • [ ] Meshtastic UI aka MUI colorTFT
  • [ ] InkHUD ePaper
  • [ ] OLED slide UI on any display

Firmware Version

2.7.14.50f9be9a2

Description

Hello, team. Caught the crash.

DEBUG | ??:??:?? 13536 [RadioIf] Raw incoming packet: 68 84 5d 33 28 63 65 ba d0 f9 86 dd ef 08 00 28  h.]3(ce........(
DEBUG | ??:??:?? 13536 [RadioIf] Raw incoming packet: 4e 67 6a 3f 91 c2 50 bb 21 a6 ed 4d 5e b7 50 72  Ngj?..P.!..M^.Pr
DEBUG | ??:??:?? 13536 [RadioIf] Raw incoming packet: c1 5e 2c 56 ab a5 d2 e3 e4 ed 26 89 1e 55 63 4f  .^,V......&..UcO
DEBUG | ??:??:?? 13536 [RadioIf] Raw incoming packet: f7 01 82 3c 31 22                                ...<1"
DEBUG | ??:??:?? 13536 [RadioIf] Corrected frequency offset: -4.843750
DEBUG | ??:??:?? 13536 [RadioIf] Lora RX 
0xdd86f9d0:0xba656328->0x335d8468 tr: 0 WantAck=1 HL=7 HS=7 Ch=0x8 ENC len=54 SNR:6.5 RSSI:-31 Rly:0x28
DEBUG | ??:??:?? 13536 [RadioIf] Packet RX: 641ms
INFO  | ??:??:?? 13536 [Router] Packet History - insert: Using new slot @uptime 13536.484s TRACE NEW
DEBUG | ??:??:?? 13536 [Router] Use channel 0 (hash 0x8)
DEBUG | ??:??:?? 13536 [Router] Expand short PSK #1
DEBUG��@INFO  | ??:??:?? 2 

//\ E S H T /\ S T / C

DEBUG | ??:??:?? 2 Filesystem files:

If replay the same packet it will not crash:

DEBUG | ??:??:?? 1106 [RadioIf] Raw incoming packet: 68 84 5d 33 28 63 65 ba d0 f9 86 dd ef 08 00 28  h.]3(ce........(
DEBUG | ??:??:?? 1106 [RadioIf] Raw incoming packet: 4e 67 6a 3f 91 c2 50 bb 21 a6 ed 4d 5e b7 50 72  Ngj?..P.!..M^.Pr
DEBUG | ??:??:?? 1106 [RadioIf] Raw incoming packet: c1 5e 2c 56 ab a5 d2 e3 e4 ed 26 89 1e 55 63 4f  .^,V......&..UcO
DEBUG | ??:??:?? 1106 [RadioIf] Raw incoming packet: f7 01 82 3c 31 22                                ...<1"
DEBUG | ??:??:?? 1106 [RadioIf] Corrected frequency offset: -148.218750
DEBUG | ??:??:?? 1106 [RadioIf] Lora RX 
0xdd86f9d0:0xba656328->0x335d8468 tr: 0 WantAck=1 HL=7 HS=7 Ch=0x8 ENC len=54 SNR:6.25 RSSI:-21 Rly:0x28
DEBUG | ??:??:?? 1106 [RadioIf] Packet RX: 641ms
INFO  | ??:??:?? 1106 [Router] Packet History - insert: Using new slot @uptime 1106.044s TRACE NEW
DEBUG | ??:??:?? 1106 [Router] Use channel 0 (hash 0x8)
DEBUG | ??:??:?? 1106 [Router] Expand short PSK #1
DEBUG | ??:??:?? 1106 [Router] Use AES128 key!
DEBUG | ??:??:?? 1106 [Router] decoded message 
0xdd86f9d0:0xba656328->0x335d8468 tr: 1 WantAck=1 HL=7 HS=7 Ch=0x0 Portnum:4 SNR:6.25 RSSI:-21 Rly:0x28

Any ideas?

Relevant log output


FFAMax avatar Dec 07 '25 15:12 FFAMax

Caught one more

DEBUG | ??:??:?? 369 [RadioIf] Raw incoming packet: 55 f6 57 21 28 63 65 ba 1a 3a 24 48 ef 08 00 28  U.W!(ce..:$H...(
DEBUG | ??:??:?? 369 [RadioIf] Raw incoming packet: b6 17 ae 87 84 ba f7 ea ab 11 35 cf 96 08 6e 04  ..........5...n.
DEBUG | ??:??:?? 369 [RadioIf] Raw incoming packet: 83 4a a0 9c a9 15 24 99 50 a4 7b 95 bd 97 43 40  .J....$.P.{...C@
DEBUG | ??:??:?? 369 [RadioIf] Raw incoming packet: f6 55 15 7f                                      .U..
DEBUG | ??:??:?? 369 [RadioIf] Corrected frequency offset: -31.968748
DEBUG | ??:??:?? 369 [RadioIf] Lora RX 
0x48243a1a:0xba656328->0x2157f655 tr: 0 WantAck=1 HL=7 HS=7 Ch=0x8 ENC len=52 SNR:5.75 RSSI:-33 Rly:0x28
DEBUG | ??:??:?? 369 [RadioIf] Packet RX: 641ms
��@INFO  | ??:??:?? 2 

//\ E S H T /\ S T / C

The next lines could be

LOG_DEBUG("Packet RX: %ums", airtime_ms);
# or 
LOG_DEBUG("Packet RX (noise?) : %ums", airtime_ms);

FFAMax avatar Dec 07 '25 19:12 FFAMax

Does the crash also still occur in firmware 2.7.15? That is the current beta version, i.e. out of alpha stage.

shalberd avatar Dec 07 '25 21:12 shalberd

Does the crash also still occur in firmware 2.7.15? That is the current beta version, i.e. out of alpha stage.

Flashed with 2.7.15.567b8ea. Will see.

FFAMax avatar Dec 07 '25 21:12 FFAMax

https://github.com/meshtastic/firmware/blob/8fe98db5dd6738546db0d27c6823e3380df322d4/src/mesh/Router.cpp#L430-L432

This uses nodeDB->getMeshNode(p->to)->user.public_key.size without first ensuring getMeshNode(p->to) is non-NULL; if the "to" node isn't present in the DB we dereference nullptr and crash.

@compumike Do you agree? I'm confused because if that's the case I wonder why it wouldn't happen all the time

Edit: Oh wait, the "to" node is us 😄 Could it happen that the NodeDB does not contain the local node (yet)?

korbinianbauer avatar Dec 10 '25 11:12 korbinianbauer

Hi @FFAMax how is it coming along with firmware 2.7.15? Does the error on decode still occur?

shalberd avatar Dec 10 '25 20:12 shalberd

Hi @FFAMax how is it coming along with firmware 2.7.15? Does the error on decode still occur?

Testing interrupted. Restarted. Monitoring in progress.

FFAMax avatar Dec 11 '25 01:12 FFAMax

@shalberd uptime 48h

FFAMax avatar Dec 13 '25 03:12 FFAMax

Uptime 98h.

FFAMax avatar Dec 15 '25 05:12 FFAMax

Uptime 6 days. Test finished.

The original issue somehow related to LOG_DEBUG of the incoming packets when dumping them to terminal. Not 100% sure but if comment those lines - device is more stable.

FFAMax avatar Dec 16 '25 22:12 FFAMax