RIOT icon indicating copy to clipboard operation
RIOT copied to clipboard

sys/uri_parser: Adding the port as uint16_t

Open Teufelchen1 opened this issue 3 years ago • 6 comments

Contribution description

Hi! 👋

This PR aims to add a uint16_t port to the struct uri_parser_result_t of the URI parser module. The already existing const char *port is renamed to const char *port_str. I'm open for other naming suggestions.

I touched:

  • sys/include/uri_parser.h: Adding said member port
  • sys/uri_parser/uri_parser.c: Making use of the new member & filling it with data
  • sys/net/application_layer/gcoap/forward_proxy.c: This file used the old string-based port and got updated accordingly, fixing a potential integer overflow
  • pkg/wakaama/contrib/lwm2m_client_connection.c: This file used the old string-based port and got updated accordingly, fixing a potential integer overflow
  • tests/unittests/tests-uri_parser/tests-uri_parser.c: Adjusted the test accordingly

Reason for change

I was working on PR #18053 where @leandrolanzieri suggested using the URI parser instead of my proposed fix.

After looking into it, I stumbled accross the problem of getting an uint16_t from a buffer(uri_parser_result_t.port) of char - in comparison to a C string which is \0 terminated, this buffer is not. I digged through some documentation of posix/libc/etc. and did not find a suiting solution. E.g. atoi can't be used, as it just wheeps through the memory until either a \0 or the first invalid character is encountered. This is a potential out-of-bounds read. One solution could be to memcpy the buffer into a bigger buffer and manually terminate it. I hesitated to do so as I did not want to increase the stack usage of the URI parser just for a null termination.

Another issue is with atoi is that it returns an int which could exceed the limits of my uint16_t, hence additional sanity check is needed.

Current state of the RIOT codebase in regard to my issue

Next, I looked through the instances where the URI parser is used in the RIOT codebase, in order to inspect the solution somebody else has come up with: sys/net/application_layer/gcoap/forward_proxy.c:

if (urip->port_len) {
    /* copy port string into scratch for atoi */
    memcpy(scratch, urip->port, urip->port_len);
    scratch[urip->port_len] = '\0';

    remote->port = atoi(scratch);

    if (remote->port == 0) {
      return false;
    }
}

The solution in forward_proxy.c suffers from the issues above and should be fixed. I can't copy that.

pkg/wakaama/contrib/lwm2m_client_connection.c:

   ...
   res = uri_parser_process_string(&parsed_uri, uri);
   ...
    if (!parsed_uri.port) {
       ...
    }
    else {
        port = parsed_uri.port;
    }
    ...
    /* configure to any IPv6 */
    conn->remote.family = AF_INET6;
    conn->remote.netif = SOCK_ADDR_ANY_NETIF;
    conn->remote.port = atoi(port);

This is again not ideal, e.g. this could set the conn->remote.port = atoi("123456789") to an overflow, resulting in an undefined behavoir (Not to bad, as it is "just" a port but still).

We could compare this to the sock_util.c. While it does have the luxury of working with a null terminated string, it still has to convert it safely.

sys/net/sock/sock_util.c:

int _parse_port(sock_udp_ep_t *ep_out, const char *portstart)
{
    int port_len = strlen(portstart);

    /* Checks here verify that the supplied port number is up to 5 (random)
     * chars in size and result is smaller or equal to UINT16_MAX. */
    if (port_len > PORT_STR_LEN) {
        return -EINVAL;
    }
    uint32_t port = atol(portstart);
    if (port > UINT16_MAX) {
        return -EINVAL;
    }
    ep_out->port = (uint16_t)port;
    return port_len;
}

As shown, this does convert a given string safely to an uint16_t. It checks for the max string length and covers overflows. My solution directly copies this sanitising.

My proposed solution

After spending some time with it, I decided it would be best to enhance the URI parser to directly yielding a port number in a convenient way (uint16_t). This way the work of converting char arrays to ports would only be done once (compared to the two times at the moment). The safty checks are at least done once in the parser and they are done (hopefully) correctly & completly. I also believe it is much more fun to use a parser wich yields a port in an easy to use format :) but thats a matter of taste.

Whats left to say

  • I feel uncomfortable working with the test-uri_parser.c file, please double check my work there
  • The naming of port_str and port_str_len feels weird, suggestions?
  • We still have slightly doublicated code, as both parser & sock_util parse 'ascii->port'
  • Still using atoi (or one of its friends)
  • Sock_util:_parse_port still resolves 34a3 to the port 34 instead of erroring
  • Sock_util:_parse_port still resolves 0x10 to the port 16 instead of erroring
  • The uri parser errors on 0x10, this might be invalid behavoir if the RFC for URIs allows hexadecimal ports
  • The uri parser ignores leading zeroes when converting, e.g. 0045 yields port 45
  • The uri parser does not ignore leading zeroes when checking max string length, e.g. 000123 yields an error
  • The uri parser now memset's the result struct back to zero on error
  • I have no clue nor feeling for the riot coding style yet. Expect whitespace nits.

Testing procedure

Beside some static local and manual testing, I did not perform any integration checks. Mainly because I haven't come around to understand the testing infrastructure of RIOT.

Teufelchen1 avatar May 12 '22 20:05 Teufelchen1

Hm, it would be API breaking edit to remove, but, is there a use case for keeping the port as string?

kaspar030 avatar May 13 '22 09:05 kaspar030

Hm, it would be API breaking edit to remove, but, is there a use case for keeping the port as string?

Keep in mind that it also points at the position of the port in the URI. While I don't have a specific use case to have the position of the URI components, I have some gut feeling, that it might be useful.

miri64 avatar May 13 '22 10:05 miri64

(one use case could be to remove the port e.g., if it is redundant as in http://example.org:80.)

miri64 avatar May 13 '22 10:05 miri64

Hi 👋 What would be my next action to get this merged? :)

I addressed the struct packaging as suggested. I did not adopt the usage of scn_u32_dec().

Teufelchen1 avatar Jun 23 '22 09:06 Teufelchen1

Hi :wave: What would be my next action to get this merged? :)

Pinging a maintainer, which you just did. Will give this a final look, but I think this is ready to get merged.

miri64 avatar Jun 23 '22 12:06 miri64

Better run the unittests locally using, too

make -C tests/unittests/ tests-uri_parser
make -C tests/unittests/ test

(need to be run in separate Make commands, as test does not depend on tests-uri_parser.

miri64 avatar Jun 24 '22 09:06 miri64

@miri64 Thanks for waiting! I got distracted by my exams and vacation 😅🇳🇴. I updated the tests as you suggested. Note that I also adjusted the recently added sys/net/application_layer/gcoap/dns.c for the new parser.

Had to force push once again - the code wasn't happy on a 16 bit MCU.

Teufelchen1 avatar Aug 23 '22 12:08 Teufelchen1

Skipping compile test as this completed twice and only failed due to running hardware tests with a bug in the esp on master.

In fact, hardware tests probably shouldn't be run as this is a sys module.

MrKevinWeiss avatar Aug 30 '22 15:08 MrKevinWeiss

If any changes are made after, please unset the skip test flag. If @kfessel is satisfied we can enable auto-merge.

MrKevinWeiss avatar Aug 30 '22 15:08 MrKevinWeiss

@miri64 was about to ack this.

Don't wait for me next time ;-)

miri64 avatar Sep 19 '22 09:09 miri64