libpostal
libpostal copied to clipboard
parse_address spins at 100% CPU on a 1-byte input
Hi!
I was checking out libpostal, and saw something that could be improved.
My country is
United States
Here's how I'm using libpostal
I'm attempting to fuzz libpostal to ensure that malformed inputs cannot lead to security vulnerabilities -- particularly memory corruption ones, as is common in C.
Here's what I did
#include <stdint.h>
#include <string>
#include <libpostal/libpostal.h>
struct PostalState {
PostalState() {
if (!libpostal_setup() || !libpostal_setup_parser()) {
exit(EXIT_FAILURE);
}
options = libpostal_get_address_parser_default_options();
}
libpostal_address_parser_options_t options;
};
PostalState kPostalState;
extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
std::string storage(reinterpret_cast<const char *>(data), size);
libpostal_address_parser_response_t *parsed = libpostal_parse_address(const_cast<char *>(storage.c_str()), kPostalState.options);
if (parsed) {
// Touch all the components to ensure they point to valid memory.
std::string value;
for (size_t i = 0; i < parsed->num_components; i++) {
value += parsed->labels[i];
value += parsed->components[i];
}
libpostal_address_parser_response_destroy(parsed);
}
return 0;
}
root@5d4a573e7ed1:/src/libpostal# /out/libpostal_parse_address_fuzzer input.txt -rss_limit_mb=-1 -timeout=10
INFO: Seed: 3602331429
INFO: Loaded 1 modules (87 inline 8-bit counters): 87 [0x10f81b0, 0x10f8207),
INFO: Loaded 1 PC tables (87 PCs): 87 [0xa43bb8,0xa44128),
/out/libpostal_parse_address_fuzzer: Running 1 inputs 1 time(s) each.
Running: artifacts/timeout-7a9dcb9f3a26b4619ea24b63f60f977d32b2b0f5
WARN invalid UTF-8
at transliterate (transliterate.c:791) errno: No such file or directory
WARN invalid UTF-8
at transliterate (transliterate.c:791) errno: No such file or directory
ALARM: working on the last Unit for 11 seconds
and the timeout value is 10 (use -timeout=N to change)
==160== ERROR: libFuzzer: timeout after 11 seconds
#0 0x49f3c1 in __sanitizer_print_stack_trace /src/llvm/projects/compiler-rt/lib/asan/asan_stack.cpp:86:3
#1 0x56882d in fuzzer::PrintStackTrace() /src/libfuzzer/FuzzerUtil.cpp:205:5
#2 0x5136bd in fuzzer::Fuzzer::AlarmCallback() /src/libfuzzer/FuzzerLoop.cpp:300:5
#3 0x7a887d69538f (/lib/x86_64-linux-gnu/libpthread.so.0+0x1138f)
#4 0x5b7763 in scan_token /src/libpostal/src/scanner.c:125:6
#5 0x8c4491 in tokenize_add_tokens /src/libpostal/src/scanner.re:238:44
#6 0x8c4718 in tokenize /src/libpostal/src/scanner.re:268:5
#7 0x59a049 in address_parser_parse /src/libpostal/src/address_parser.c:1678:27
#8 0x578feb in libpostal_parse_address /src/libpostal/src/libpostal.c:240:51
#9 0x4c8d39 in LLVMFuzzerTestOneInput /src/libpostal/../libpostal_parse_address_fuzzer.cc:22:51
#10 0x519bb6 in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) /src/libfuzzer/FuzzerLoop.cpp:556:15
#11 0x4ca78f in fuzzer::RunOneTest(fuzzer::Fuzzer*, char const*, unsigned long) /src/libfuzzer/FuzzerDriver.cpp:292:6
#12 0x4d83f2 in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) /src/libfuzzer/FuzzerDriver.cpp:774:9
#13 0x4c9dd7 in main /src/libfuzzer/FuzzerMain.cpp:19:10
#14 0x7a887ccb882f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2082f)
#15 0x41e3a8 in _start (/out/libpostal_parse_address_fuzzer+0x41e3a8)
SUMMARY: libFuzzer: timeout
root@5d4a573e7ed1:/src/libpostal# xxd input.txt
00000000: c6 .
Here's what I got
It spun at 100% CPU until the timeout kicked in.
Here's what I was expecting
An immediate error on parse failure.
both libpostal_parse_address() and libpostal_expand_address() choke on invalid utf-8 input. A warning is printed twice followed by an endless loop. WARN invalid UTF-8 at transliterate (transliterate.c:791) WARN invalid UTF-8 at transliterate (transliterate.c:791) repro C source attached. postal-bug.zip
Had the same issue, solved it by validating that input string is a valid UTF8. I guess it should be part of the library check itself but here it is anyway if someone would find it useful
bool isValidUTF8(const std::string &string) {
int c, i, ix, n, j;
for (i = 0, ix = string.length(); i < ix; i++) {
c = (unsigned char)string[i];
if (0x00 <= c && c <= 0x7f)
n = 0;
else if ((c & 0xE0) == 0xC0)
n = 1;
else if (c == 0xed && i < (ix - 1) && ((unsigned char)string[i + 1] & 0xa0) == 0xa0)
return false;
else if ((c & 0xF0) == 0xE0)
n = 2;
else if ((c & 0xF8) == 0xF0)
n = 3;
else
return false;
for (j = 0; j < n && i < ix; j++) {
if ((++i == ix) || (((unsigned char)string[i] & 0xC0) != 0x80)) return false;
}
}
return true;
}