RIOT icon indicating copy to clipboard operation
RIOT copied to clipboard

ieee802154/submac: change submac fsm logic

Open Stopkaa opened this issue 5 months ago • 7 comments

Convert submac.c state machine to prevent race conditions

Overview

This PR converts the state machine in the submac.c module to follow the same design pattern as implemented in DSMEFSM.h.

Problem

The current state machine implementation is susceptible to race conditions and error states, particularly when events are scheduled unnecessarily by other threads during the bh_request/process procedure.

Solution

The new state machine design:

  • Prevents race conditions by implementing a more robust event handling mechanism
  • Enables multi-state processing within a single cycle, improving efficiency
  • Eliminates unnecessary event scheduling that could lead to inconsistent states
  • Provides better thread safety for the bh_request/process workflow

Benefits

  • More stable and reliable FSM operation
  • Reduced potential for error states
  • Better performance through multi-state cycle processing
  • Improved thread safety and concurrent access handling

Technical Changes

  • Removed bh_request/bh_process mechanism
  • Implemented direct FSM state transitions
  • Added multi-state cycle processing capability
  • Adjust other parts according to the new logic
  • Follows proven design patterns from DSMEFSM.h for consistency

Testing procedure

  • GNRC Ping Test with 2 nRF52840 Devices
cd RIOT/examples/networking/gnrc/gnrc_networking
ping fe80::b8f0:7f3f:bb50:3c2a -s 1024 -i 300 -c 10000
2025-07-02 18:40:52,528 # --- fe80::b8f0:7f3f:bb50:3c2a PING statistics ---
2025-07-02 18:40:52,534 # 10000 packets transmitted, 9867 packets received, 1% packet loss
2025-07-02 18:40:52,538 # round-trip min/avg/max = 126.999/168.469/301.789 ms
  • Submac Test with 2 nRF52840 Devices
2025-07-02 19:05:23,984 # txtsnd FE:B2:91:30:A6:FA:7C:B3 64
> 2025-07-02 19:05:23,991 # Tx complete
2025-07-02 19:05:23,990 # DATA
2025-07-02 19:05:23,995 # Dest. PAN: 0x0023, Dest. addr.: fe:b2:91:30:a6:fa:7c:b3
2025-07-02 19:05:24,000 # Src. PAN: 0x0023, Src. addr.: ba:f0:7f:3f:bb:50:34:2a
2025-07-02 19:05:24,005 # Security: 0, Frame pend.: 0, ACK req.: 1, PAN comp.: 1
2025-07-02 19:05:24,007 # Version: 1, Seq.: 1
2025-07-02 19:05:24,014 # 00000000  4C  6F  72  65  6D  20  69  70  73  75  6D  20  64  6F  6C  6F
2025-07-02 19:05:24,022 # 00000010  72  20  73  69  74  20  61  6D  65  74  2C  20  63  6F  6E  73
2025-07-02 19:05:24,029 # 00000020  65  63  74  65  74  75  72  20  61  64  69  70  69  73  63  69
2025-07-02 19:05:24,036 # 00000030  6E  67  20  65  6C  69  74  2E  20  45  74  69  61  6D  20  6F
2025-07-02 19:05:24,045 # txt (64 chars): Lorem ipsum dolor sit amet, consectetur adipiscing elit. Etiam o
2025-07-02 19:05:24,045 # RSSI: 93, LQI: 48

Issues/PRs references

Stopkaa avatar Jul 02 '25 17:07 Stopkaa

Murdock results

:heavy_check_mark: PASSED

8b09692a2a70b961ed6a4bd10d6d7e686dd1e8cf ieee802154/submac: change submac fsm logic,

Success Failures Total Runtime
10930 0 10931 11m:59s

Artifacts

riot-ci avatar Aug 14 '25 16:08 riot-ci

This needs a rebase

benpicco avatar Oct 26 '25 13:10 benpicco

@jia200x you can restart failed Murdock build jobs in the CI as well:

Either click on the small arrow on the right and select "Restart"... image

...or click on the circular arrow on the upper right side in the job page: image

You have to be logged in there though, but you can log in with Github :)

crasbe avatar Nov 07 '25 13:11 crasbe

I'm not sure why the test build keeps timing out in CI, but when I tried building it locally with make -C tests/net/socket_zep, I encountered the following error:

RIOT/cpu/native/socket_zep/socket_zep.c:244:29: error: initializer-string for array of 'char' truncates NUL terminator but destination lacks 'nonstring' attribute (3 chars into 2 available) [-Werror=unterminated-string-initialization]
  244 |             .hdr.preamble = "EX",
      |                             ^~~~

Replacing "EX" with {'E', 'X'} fixes the issue for me and the build completes successfully

Stopkaa avatar Nov 12 '25 10:11 Stopkaa

On my machine the tests/socker_zep fails with a SegFault with and without llvm alike

Build Logs
buechse@skyleaf:~/RIOTstuff/riot-vanillaice/RIOT$ make BOARD=native64 TOOLCHAIN=llvm -C tests/net/socket_zep all test
make: Entering directory '/home/buechse/RIOTstuff/riot-vanillaice/RIOT/tests/net/socket_zep'
Building application "tests_socket_zep" for "native64" with CPU "native".

"make" -C /home/buechse/RIOTstuff/riot-vanillaice/RIOT/boards
...
"make" -C /home/buechse/RIOTstuff/riot-vanillaice/RIOT/sys/ztimer
/usr/bin/ld: warning: /home/buechse/RIOTstuff/riot-vanillaice/RIOT/tests/net/socket_zep/bin/native64/tests_socket_zep.elf has a LOAD segment with RWX permissions
   text    data     bss     dec     hex filename
  59420    1640   59904  120964   1d884 /home/buechse/RIOTstuff/riot-vanillaice/RIOT/tests/net/socket_zep/bin/native64/tests_socket_zep.elf
/home/buechse/RIOTstuff/riot-vanillaice/RIOT/tests/net/socket_zep/bin/native64/tests_socket_zep.elf --eui64=00:5a:45:50:0a:00:30:38 -z [127.0.0.1]:44221,[127.0.0.1]:59437  tap0
RIOT native interrupts/signals initialized.
TZ not set, setting UTC
RIOT native64 board initialized.
RIOT native hardware initialization complete.

main(): This is RIOT! (Version: 2026.01-devel-62-g0c8717-pr/link_layer/ieee802154/submac)
Socket ZEP device driver test
Initializing socket ZEP with (local: [127.0.0.1]:44221, remote: [127.0.0.1]:59437)
(Hwaddrs: 3038, 005a45500a003038)
Send 'Hello\0World\0'
Segmentation fault (core dumped)
make[1]: *** [/home/buechse/RIOTstuff/riot-vanillaice/RIOT/Makefile.include:877: cleanterm] Error 139
buechse@skyleaf:~/RIOTstuff/riot-vanillaice/RIOT$ make BOARD=native64 -C tests/net/socket_zep all test
make: Entering directory '/home/buechse/RIOTstuff/riot-vanillaice/RIOT/tests/net/socket_zep'
Building application "tests_socket_zep" for "native64" with CPU "native".

"make" -C /home/buechse/RIOTstuff/riot-vanillaice/RIOT/boards
...
"make" -C /home/buechse/RIOTstuff/riot-vanillaice/RIOT/sys/ztimer
/usr/bin/ld: warning: /home/buechse/RIOTstuff/riot-vanillaice/RIOT/tests/net/socket_zep/bin/native64/tests_socket_zep.elf has a LOAD segment with RWX permissions
   text    data     bss     dec     hex filename
  64203    1640   60168  126011   1ec3b /home/buechse/RIOTstuff/riot-vanillaice/RIOT/tests/net/socket_zep/bin/native64/tests_socket_zep.elf
/home/buechse/RIOTstuff/riot-vanillaice/RIOT/tests/net/socket_zep/bin/native64/tests_socket_zep.elf --eui64=00:5a:45:50:0a:00:30:38 -z [127.0.0.1]:39843,[127.0.0.1]:46441  tap0
RIOT native interrupts/signals initialized.
TZ not set, setting UTC
RIOT native64 board initialized.
RIOT native hardware initialization complete.

main(): This is RIOT! (Version: 2026.01-devel-62-g0c8717-pr/link_layer/ieee802154/submac)
Socket ZEP device driver test
Initializing socket ZEP with (local: [127.0.0.1]:39843, remote: [127.0.0.1]:46441)
(Hwaddrs: 3038, 005a45500a003038)
Send 'Hello\0World\0'
Segmentation fault (core dumped)
make[1]: *** [/home/buechse/RIOTstuff/riot-vanillaice/RIOT/Makefile.include:877: cleanterm] Error 139

That would explain the timeout.

crasbe avatar Nov 12 '25 13:11 crasbe

should be fixed now

Stopkaa avatar Nov 28 '25 15:11 Stopkaa

Actually I discovered an issue while testing with the nrf52: The submac end up in an invalid state. This issue is introduced by PR #21533. In that PR, some assert statements were removed, which is why the problem didn't occur there. I assume this happens because the submac is polling for confirm_transmit and the radio calls the callback for tx_done and then also calls this function, so some race condition happens. I didn't trace the chain that much at that time. Of course, I could also remove the assert statements and ignore the invalid state, but this feels wrong. You can reproduce the issue by setting the interval of the ping commands slightly above the average. You can also try it in master by adding prints at submac.c:460. Then I got some prints but the invalid state is just ignored there

Stopkaa avatar Nov 29 '25 15:11 Stopkaa