utf-8-misc icon indicating copy to clipboard operation
utf-8-misc copied to clipboard

utf8_arith.h - utf8_decode_step() doesn't decode all valid sequences correct

Open gulrak opened this issue 7 years ago • 0 comments

The utf8_decode_step in utf8_arith.h doesn't work for various valid sequences, e.g. "\xED\x81\x80" should be correctly decoded to codepoint U+D040, but the function decodes it wrongly to U+D000 (tested on macOS with clang from XCode 9.2 with unoptimized debug code).

Failing UTF-8 sequences start with 0xE0, 0xED, 0xF1, 0xF2 and 0xF3. I couldn't easily find the reason, but it shouldn't be used (or with care) as it is now.

The utf8_branch.h version, while using the same tables, works flawless in my tests.

gulrak avatar Sep 13 '18 17:09 gulrak