RIOT
RIOT copied to clipboard
[gcoap-dtls] Posting a message yields a stack overflow on the `samr21-xpro` with ECC
Description
The gcoap-dtls
example leads to a stack overflow after executing a post command.
Steps to reproduce the issue
After flashing the board and executing `$ coap post fc00:: 5684
- Flash the
gcoap-dtls
test on a board and execute the terminal:
$ BOARD=samr21-xpro SERIAL=... make clean all flash term
- Execute:
# coap post fc00:: 5684 / Hi!
Expected results
No matter whether a server is listening on fc00::
or not, the non-confirmable POST message should just be sent out and appear in Wireshark.
Actual results
This output:
# coap post fc00:: 5684 / Hi!
# gcoap_cli: sending msg ID 3222, 11 bytes
# scheduler(): stack overflow detected, pid=1
# scheduler(): stack overflow detected, pid=1
# scheduler(): stack overflow detected, pid=1
Versions
The current RIOT master branch.
Additional Information
I am currently writing a small DTLS proxy for a Node JS backend, since DTLS support there is quite terrible, and with tinydtls-rs
I have developed a small Rust application to listen for DTLS packets from the board and decrypt them for the backend. However, with my application, I also get a stack overflow in pid=6
, which is the coap
thread, and then a hard fault. I decided to test this example, where I got the above problem. The Rust application is working however, and sending the handshake, but the client is not responding.
Just to add a few more details here, the firmware I am using (about 170 lines, I do not want to post all), roughly performs this in one thread:
void *data_thread(void *arg) {
(void)arg;
uint8_t buf[CONFIG_GCOAP_PDU_BUF_SIZE];
memset(buf, 0, CONFIG_GCOAP_PDU_BUF_SIZE);
// Put packet metadata
coap_pkt_t pdu = {};
gcoap_req_init(&pdu, buf, CONFIG_GCOAP_PDU_BUF_SIZE, COAP_POST, "/data");
coap_opt_add_format(&pdu, COAP_FORMAT_CBOR);
coap_hdr_set_type(pdu.hdr, COAP_TYPE_NON);
ssize_t meta_len = coap_opt_finish(&pdu, COAP_OPT_FINISH_PAYLOAD);
while (true) {
// Write some data to `buf`
size_t payload_len = ...;
// Post data
gcoap_req_send(buf, meta_len + payload_len, &host_ep, NULL, NULL);
// Some cleanup
}
return NULL;
}
And the DTLS proxy I am working on roughly does this in its write
-callback (non-RIOT code):
unsafe extern "C" fn server_write_callback(
ctx: *mut dtls_context_t,
session: *mut session_t,
buf: *mut u8,
len: c_size_t,
) -> c_int {
debug_println!("WRITE");
let socket = (*ctx).app as *mut UdpSocket;
let addr = session.as_ref().unwrap().addr.sin6.as_ref();
assert!(addr.sin6_family == AF_INET6 as u16);
(*socket)
.send_to(
std::slice::from_raw_parts(buf, len as usize),
SocketAddrV6::new(
Ipv6Addr::from(addr.sin6_addr.s6_addr),
u16::from_be(addr.sin6_port),
addr.sin6_flowinfo,
addr.sin6_scope_id,
),
)
.expect(debug_fmt!("Failed to send message"));
0
}
Can you invrease the stacksize a lot, and then get ps
output?
I already tried that. I do not have the output at hand, but I increased the stack size of the COAP Thread to > 4096 (in the RIOT source, there are some additions there) and by the time of the hardfault it used up all of it. Before that, I think about 700 bytes. I can provide a ps
output tomorrow, but this should be easily reproducible.
Can't do much testing here but could you try moving buf
outside the function scope? E.g., as a global variable.
Excuse the late reply, Still not working, but we are now switching to PSKs and that does not seem to crash. Anyways, @kaspar030 here is an idle PS of gcoap_dtls
running:
2022-07-05 17:58:50,061 # main(): This is RIOT! (Version: 2022.07-devel-949-g1e17aa)
2022-07-05 17:58:50,061 # gcoap example app
2022-07-05 17:58:50,062 # All up, running the shell now
> ps
2022-07-05 18:02:42,887 # ps
2022-07-05 18:02:42,896 # pid | name | state Q | pri | stack ( used) ( free) | base addr | current |
2022-07-05 18:02:42,904 # - | isr_stack | - - | - | 512 ( 296) ( 216) | 0x20000000 | 0x200001c0 |
2022-07-05 18:02:42,913 # 1 | main | running Q | 7 | 1536 ( 680) ( 856) | 0x20000730 | 0x20000b4c |
2022-07-05 18:02:42,922 # 2 | event | bl anyfl _ | 6 | 512 ( 196) ( 316) | 0x20000e98 | 0x20000fd4 |
2022-07-05 18:02:42,932 # 3 | 6lo | bl rx _ | 3 | 1024 ( 528) ( 496) | 0x20004348 | 0x2000462c |
2022-07-05 18:02:42,941 # 4 | ipv6 | bl rx _ | 4 | 1024 ( 448) ( 576) | 0x20001c10 | 0x20001ed4 |
2022-07-05 18:02:42,950 # 5 | udp | bl rx _ | 5 | 1024 ( 280) ( 744) | 0x2000474c | 0x20004a34 |
2022-07-05 18:02:42,959 # 6 | coap | bl anyfl _ | 6 | 2144 ( 332) ( 1812) | 0x200013ac | 0x20001b1c |
2022-07-05 18:02:42,968 # 7 | at86rf2xx | bl anyfl _ | 2 | 1024 ( 580) ( 444) | 0x20002234 | 0x200024f4 |
2022-07-05 18:02:42,975 # | SUM | | | 8800 ( 3340) ( 5460)
This stack usage seems reasonable. I cannot see more, since after crashing I can only restart the board.
I now multiplied the GCOAP_STACK_SIZE
by 4, yielding a ps
of:
> ps
2022-07-05 18:14:19,525 # ps
2022-07-05 18:14:19,534 # pid | name | state Q | pri | stack ( used) ( free) | base addr | current │
2022-07-05 18:14:19,543 # - | isr_stack | - - | - | 512 ( 280) ( 232) | 0x20000000 | 0x200001c0 |
2022-07-05 18:14:19,552 # 1 | main | running Q | 7 | 1536 ( 712) ( 824) | 0x20000730 | 0x20000b4c │
2022-07-05 18:14:19,561 # 2 | event | bl anyfl _ | 6 | 512 ( 196) ( 316) | 0x20000e98 | 0x20000fd4 │
2022-07-05 18:14:19,570 # 3 | 6lo | bl rx _ | 3 | 1024 ( 420) ( 604) | 0x20004f48 | 0x2000522c │
2022-07-05 18:14:19,579 # 4 | ipv6 | bl rx _ | 4 | 1024 ( 448) ( 576) | 0x20002810 | 0x20002ad4 │
2022-07-05 18:14:19,588 # 5 | udp | bl rx _ | 5 | 1024 ( 280) ( 744) | 0x2000534c | 0x20005634 │
2022-07-05 18:14:19,598 # 6 | coap | bl anyfl _ | 6 | 5216 ( 332) ( 4884) | 0x200013ac | 0x2000271c │
2022-07-05 18:14:19,607 # 7 | at86rf2xx | bl anyfl _ | 2 | 1024 ( 580) ( 444) | 0x20002e34 | 0x200030f4 │
2022-07-05 18:14:19,613 # | SUM | | | 11872 ( 3248) ( 8624)
And the above command still crashes.
@cgundogan I tried your idea, it still crashes. :(
Excuse the late reply, Still not working, but we are now switching to PSKs and that does not seem to crash.
ECC is known to be flaky with TinyDTLS, so I think it is good to keep this open as a known issue.
There is more info on ECC on microcontrollers on the forum btw. Could also be of interest to you.
@valentinpi sorry for the late reply. Could you try it again with increasing the stack size? But this time, please increase the stack size of the main
stack rather than the coap stack, as
# scheduler(): stack overflow detected, pid=1
indicates that the main
stack rather than the coap
stack was overflowing. Thx :)
Thank you so much for the reply, but I sadly cannot access my board right now :(. May we close the issue and could I reopen it in the case I get back to this again please?
Sure. If the issue arises again, I'm happy to assist solving :)