libpostal icon indicating copy to clipboard operation
libpostal copied to clipboard

parse_address spins at 100% CPU on a 1-byte input

Open alex opened this issue 5 years ago • 2 comments

Hi!

I was checking out libpostal, and saw something that could be improved.


My country is

United States


Here's how I'm using libpostal

I'm attempting to fuzz libpostal to ensure that malformed inputs cannot lead to security vulnerabilities -- particularly memory corruption ones, as is common in C.


Here's what I did

#include <stdint.h>
#include <string>

#include <libpostal/libpostal.h>


struct PostalState {
    PostalState() {
        if (!libpostal_setup() || !libpostal_setup_parser()) {
            exit(EXIT_FAILURE);
        }
        options = libpostal_get_address_parser_default_options();
    }

    libpostal_address_parser_options_t options;
};

PostalState kPostalState;

extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
    std::string storage(reinterpret_cast<const char *>(data), size);
    libpostal_address_parser_response_t *parsed = libpostal_parse_address(const_cast<char *>(storage.c_str()), kPostalState.options);
    if (parsed) {
        // Touch all the components to ensure they point to valid memory.
        std::string value;
        for (size_t i = 0; i < parsed->num_components; i++) {
            value += parsed->labels[i];
            value += parsed->components[i];
        }
        libpostal_address_parser_response_destroy(parsed);
    }

    return 0;
}
root@5d4a573e7ed1:/src/libpostal# /out/libpostal_parse_address_fuzzer input.txt -rss_limit_mb=-1 -timeout=10
INFO: Seed: 3602331429
INFO: Loaded 1 modules   (87 inline 8-bit counters): 87 [0x10f81b0, 0x10f8207), 
INFO: Loaded 1 PC tables (87 PCs): 87 [0xa43bb8,0xa44128), 
/out/libpostal_parse_address_fuzzer: Running 1 inputs 1 time(s) each.
Running: artifacts/timeout-7a9dcb9f3a26b4619ea24b63f60f977d32b2b0f5
WARN  invalid UTF-8
   at transliterate (transliterate.c:791) errno: No such file or directory
WARN  invalid UTF-8
   at transliterate (transliterate.c:791) errno: No such file or directory
ALARM: working on the last Unit for 11 seconds
       and the timeout value is 10 (use -timeout=N to change)
==160== ERROR: libFuzzer: timeout after 11 seconds
    #0 0x49f3c1 in __sanitizer_print_stack_trace /src/llvm/projects/compiler-rt/lib/asan/asan_stack.cpp:86:3
    #1 0x56882d in fuzzer::PrintStackTrace() /src/libfuzzer/FuzzerUtil.cpp:205:5
    #2 0x5136bd in fuzzer::Fuzzer::AlarmCallback() /src/libfuzzer/FuzzerLoop.cpp:300:5
    #3 0x7a887d69538f  (/lib/x86_64-linux-gnu/libpthread.so.0+0x1138f)
    #4 0x5b7763 in scan_token /src/libpostal/src/scanner.c:125:6
    #5 0x8c4491 in tokenize_add_tokens /src/libpostal/src/scanner.re:238:44
    #6 0x8c4718 in tokenize /src/libpostal/src/scanner.re:268:5
    #7 0x59a049 in address_parser_parse /src/libpostal/src/address_parser.c:1678:27
    #8 0x578feb in libpostal_parse_address /src/libpostal/src/libpostal.c:240:51
    #9 0x4c8d39 in LLVMFuzzerTestOneInput /src/libpostal/../libpostal_parse_address_fuzzer.cc:22:51
    #10 0x519bb6 in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) /src/libfuzzer/FuzzerLoop.cpp:556:15
    #11 0x4ca78f in fuzzer::RunOneTest(fuzzer::Fuzzer*, char const*, unsigned long) /src/libfuzzer/FuzzerDriver.cpp:292:6
    #12 0x4d83f2 in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) /src/libfuzzer/FuzzerDriver.cpp:774:9
    #13 0x4c9dd7 in main /src/libfuzzer/FuzzerMain.cpp:19:10
    #14 0x7a887ccb882f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2082f)
    #15 0x41e3a8 in _start (/out/libpostal_parse_address_fuzzer+0x41e3a8)

SUMMARY: libFuzzer: timeout
root@5d4a573e7ed1:/src/libpostal# xxd input.txt 
00000000: c6                                       .

Here's what I got

It spun at 100% CPU until the timeout kicked in.


Here's what I was expecting

An immediate error on parse failure.

alex avatar Nov 20 '19 00:11 alex

both libpostal_parse_address() and libpostal_expand_address() choke on invalid utf-8 input. A warning is printed twice followed by an endless loop. WARN invalid UTF-8 at transliterate (transliterate.c:791) WARN invalid UTF-8 at transliterate (transliterate.c:791) repro C source attached. postal-bug.zip

ComputerDoktor avatar Nov 21 '19 10:11 ComputerDoktor

Had the same issue, solved it by validating that input string is a valid UTF8. I guess it should be part of the library check itself but here it is anyway if someone would find it useful

bool isValidUTF8(const std::string &string) {
  int c, i, ix, n, j;
  for (i = 0, ix = string.length(); i < ix; i++) {
    c = (unsigned char)string[i];
    if (0x00 <= c && c <= 0x7f)
      n = 0;
    else if ((c & 0xE0) == 0xC0)
      n = 1;
    else if (c == 0xed && i < (ix - 1) && ((unsigned char)string[i + 1] & 0xa0) == 0xa0)
      return false;
    else if ((c & 0xF0) == 0xE0)
      n = 2;
    else if ((c & 0xF8) == 0xF0)
      n = 3;
    else
      return false;
    for (j = 0; j < n && i < ix; j++) {
      if ((++i == ix) || (((unsigned char)string[i] & 0xC0) != 0x80)) return false;
    }
  }
  return true;
}

artemyarulin avatar Jul 24 '20 10:07 artemyarulin