RIOT icon indicating copy to clipboard operation
RIOT copied to clipboard

[gcoap-dtls] Posting a message yields a stack overflow on the `samr21-xpro` with ECC

Open valentinpi opened this issue 2 years ago • 7 comments

Description

The gcoap-dtls example leads to a stack overflow after executing a post command.

Steps to reproduce the issue

After flashing the board and executing `$ coap post fc00:: 5684

  1. Flash the gcoap-dtls test on a board and execute the terminal:
$ BOARD=samr21-xpro SERIAL=... make clean all flash term
  1. Execute:
# coap post fc00:: 5684 / Hi!

Expected results

No matter whether a server is listening on fc00:: or not, the non-confirmable POST message should just be sent out and appear in Wireshark.

Actual results

This output:

# coap post fc00:: 5684 / Hi!
# gcoap_cli: sending msg ID 3222, 11 bytes
# scheduler(): stack overflow detected, pid=1
# scheduler(): stack overflow detected, pid=1
# scheduler(): stack overflow detected, pid=1

Versions

The current RIOT master branch.

Additional Information

I am currently writing a small DTLS proxy for a Node JS backend, since DTLS support there is quite terrible, and with tinydtls-rs I have developed a small Rust application to listen for DTLS packets from the board and decrypt them for the backend. However, with my application, I also get a stack overflow in pid=6, which is the coap thread, and then a hard fault. I decided to test this example, where I got the above problem. The Rust application is working however, and sending the handshake, but the client is not responding.

valentinpi avatar Jul 03 '22 15:07 valentinpi

Just to add a few more details here, the firmware I am using (about 170 lines, I do not want to post all), roughly performs this in one thread:

void *data_thread(void *arg) {
    (void)arg;

    uint8_t buf[CONFIG_GCOAP_PDU_BUF_SIZE];
    memset(buf, 0, CONFIG_GCOAP_PDU_BUF_SIZE);

    // Put packet metadata
    coap_pkt_t pdu = {};
    gcoap_req_init(&pdu, buf, CONFIG_GCOAP_PDU_BUF_SIZE, COAP_POST, "/data");
    coap_opt_add_format(&pdu, COAP_FORMAT_CBOR);
    coap_hdr_set_type(pdu.hdr, COAP_TYPE_NON);
    ssize_t meta_len = coap_opt_finish(&pdu, COAP_OPT_FINISH_PAYLOAD);
    while (true) {
        // Write some data to `buf`
        size_t payload_len = ...;

        // Post data
        gcoap_req_send(buf, meta_len + payload_len, &host_ep, NULL, NULL);

        // Some cleanup
    }

    return NULL;
}

And the DTLS proxy I am working on roughly does this in its write-callback (non-RIOT code):

unsafe extern "C" fn server_write_callback(
    ctx: *mut dtls_context_t,
    session: *mut session_t,
    buf: *mut u8,
    len: c_size_t,
) -> c_int {
    debug_println!("WRITE");

    let socket = (*ctx).app as *mut UdpSocket;
    let addr = session.as_ref().unwrap().addr.sin6.as_ref();

    assert!(addr.sin6_family == AF_INET6 as u16);

    (*socket)
        .send_to(
            std::slice::from_raw_parts(buf, len as usize),
            SocketAddrV6::new(
                Ipv6Addr::from(addr.sin6_addr.s6_addr),
                u16::from_be(addr.sin6_port),
                addr.sin6_flowinfo,
                addr.sin6_scope_id,
            ),
        )
        .expect(debug_fmt!("Failed to send message"));

    0
}

valentinpi avatar Jul 03 '22 19:07 valentinpi

Can you invrease the stacksize a lot, and then get ps output?

kaspar030 avatar Jul 03 '22 20:07 kaspar030

I already tried that. I do not have the output at hand, but I increased the stack size of the COAP Thread to > 4096 (in the RIOT source, there are some additions there) and by the time of the hardfault it used up all of it. Before that, I think about 700 bytes. I can provide a ps output tomorrow, but this should be easily reproducible.

valentinpi avatar Jul 03 '22 20:07 valentinpi

Can't do much testing here but could you try moving buf outside the function scope? E.g., as a global variable.

cgundogan avatar Jul 03 '22 21:07 cgundogan

Excuse the late reply, Still not working, but we are now switching to PSKs and that does not seem to crash. Anyways, @kaspar030 here is an idle PS of gcoap_dtls running:

2022-07-05 17:58:50,061 # main(): This is RIOT! (Version: 2022.07-devel-949-g1e17aa)                                            
2022-07-05 17:58:50,061 # gcoap example app                                                                                     
2022-07-05 17:58:50,062 # All up, running the shell now                                                                         
> ps                                                                                                                            
2022-07-05 18:02:42,887 # ps                                                                                                    
2022-07-05 18:02:42,896 #       pid | name                 | state    Q | pri | stack  ( used) ( free) | base addr  | current    | 
2022-07-05 18:02:42,904 #         - | isr_stack            | -        - |   - |    512 (  296) (  216) | 0x20000000 | 0x200001c0 | 
2022-07-05 18:02:42,913 #         1 | main                 | running  Q |   7 |   1536 (  680) (  856) | 0x20000730 | 0x20000b4c | 
2022-07-05 18:02:42,922 #         2 | event                | bl anyfl _ |   6 |    512 (  196) (  316) | 0x20000e98 | 0x20000fd4 | 
2022-07-05 18:02:42,932 #         3 | 6lo                  | bl rx    _ |   3 |   1024 (  528) (  496) | 0x20004348 | 0x2000462c | 
2022-07-05 18:02:42,941 #         4 | ipv6                 | bl rx    _ |   4 |   1024 (  448) (  576) | 0x20001c10 | 0x20001ed4 | 
2022-07-05 18:02:42,950 #         5 | udp                  | bl rx    _ |   5 |   1024 (  280) (  744) | 0x2000474c | 0x20004a34 | 
2022-07-05 18:02:42,959 #         6 | coap                 | bl anyfl _ |   6 |   2144 (  332) ( 1812) | 0x200013ac | 0x20001b1c | 
2022-07-05 18:02:42,968 #         7 | at86rf2xx            | bl anyfl _ |   2 |   1024 (  580) (  444) | 0x20002234 | 0x200024f4 | 
2022-07-05 18:02:42,975 #           | SUM                  |            |     |   8800 ( 3340) ( 5460)                          

This stack usage seems reasonable. I cannot see more, since after crashing I can only restart the board.

I now multiplied the GCOAP_STACK_SIZE by 4, yielding a ps of:

> ps                                                                                                                             
2022-07-05 18:14:19,525 # ps                                                                                                     
2022-07-05 18:14:19,534 #       pid | name                 | state    Q | pri | stack  ( used) ( free) | base addr  | current    │
2022-07-05 18:14:19,543 #         - | isr_stack            | -        - |   - |    512 (  280) (  232) | 0x20000000 | 0x200001c0 | 
2022-07-05 18:14:19,552 #         1 | main                 | running  Q |   7 |   1536 (  712) (  824) | 0x20000730 | 0x20000b4c │
2022-07-05 18:14:19,561 #         2 | event                | bl anyfl _ |   6 |    512 (  196) (  316) | 0x20000e98 | 0x20000fd4 │
2022-07-05 18:14:19,570 #         3 | 6lo                  | bl rx    _ |   3 |   1024 (  420) (  604) | 0x20004f48 | 0x2000522c │
2022-07-05 18:14:19,579 #         4 | ipv6                 | bl rx    _ |   4 |   1024 (  448) (  576) | 0x20002810 | 0x20002ad4 │
2022-07-05 18:14:19,588 #         5 | udp                  | bl rx    _ |   5 |   1024 (  280) (  744) | 0x2000534c | 0x20005634 │
2022-07-05 18:14:19,598 #         6 | coap                 | bl anyfl _ |   6 |   5216 (  332) ( 4884) | 0x200013ac | 0x2000271c │
2022-07-05 18:14:19,607 #         7 | at86rf2xx            | bl anyfl _ |   2 |   1024 (  580) (  444) | 0x20002e34 | 0x200030f4 │
2022-07-05 18:14:19,613 #           | SUM                  |            |     |  11872 ( 3248) ( 8624)

And the above command still crashes.

@cgundogan I tried your idea, it still crashes. :(

valentinpi avatar Jul 05 '22 16:07 valentinpi

Excuse the late reply, Still not working, but we are now switching to PSKs and that does not seem to crash.

ECC is known to be flaky with TinyDTLS, so I think it is good to keep this open as a known issue.

miri64 avatar Jul 11 '22 08:07 miri64

There is more info on ECC on microcontrollers on the forum btw. Could also be of interest to you.

miri64 avatar Jul 11 '22 08:07 miri64

@valentinpi sorry for the late reply. Could you try it again with increasing the stack size? But this time, please increase the stack size of the main stack rather than the coap stack, as

# scheduler(): stack overflow detected, pid=1

indicates that the main stack rather than the coap stack was overflowing. Thx :)

maribu avatar May 18 '23 18:05 maribu

Thank you so much for the reply, but I sadly cannot access my board right now :(. May we close the issue and could I reopen it in the case I get back to this again please?

valentinpi avatar Jun 04 '23 00:06 valentinpi

Sure. If the issue arises again, I'm happy to assist solving :)

maribu avatar Jun 04 '23 05:06 maribu