jsonc-parser icon indicating copy to clipboard operation
jsonc-parser copied to clipboard

JSONC parser fails to correctly parse non-BMP escape sequences

Open KiloJuliett opened this issue 2 years ago • 1 comments

In accordance with RFC 8258 § 7, the non-BMP character 𝄞 (U+1D11E) should be escaped as the escaped surrogate pair \uD834\uDD1E. Therefore, I expect the following Rust code to compile and run successfully:

use jsonc_parser::JsonValue;
use jsonc_parser::parse_to_value;

fn main() {
    let src = r#""\uD834\uDD1E""#;
    let v = parse_to_value(src, &Default::default()).unwrap().unwrap();
    if let JsonValue::String(s) = v {
        assert_eq!("\u{1D11E}", s)
    }
    else {
        panic!();
    }
}

However, on the latest version of jsonc-parser (as of writing, this is version 0.21.0), this code panics at the unwrap on line 6 with the message "Invalid unicode escape sequence. 'D834' is not a valid UTF8 character".

KiloJuliett avatar Sep 02 '22 14:09 KiloJuliett

Not entirely sure, but this recently merged RFC might be relevant.

Ron has adopted it in their v0.9 release instead of base64 for properly supporting roundtripping with byte strings. serde_json didn't have the issue though.

polarathene avatar Oct 19 '23 01:10 polarathene