parsing_json.php improvements

Open Alcaro opened this issue 8 years ago • 0 comments

(Apologies if this isn't the correct place, I couldn't find anything better.)

Just found http://seriot.ch/parsing_json.php. Great writeup, it's surprising how something so seemingly simple can have so many ways to screw up. I found a few possible improvements:

i_string_iso_latin_1.json | ["E9"] n_string_invalid_utf-8.json | ["FF"]

As of #30, both are i_.

["\uD800\uD800"] makes some parsers go nuts. R jsonlite yields ["\U00010000"], while Ruby parser yields ["F0908080"]. I still don't get where this value comes from.

Overeager decoding of surrogate pairs. \uD800\uDC00 should yield \U00010000, I guess that one ignores the top 10 bits of the supposed surrogate-low? F0908080 is \U00010000 in UTF-8, again ignoring the top 10 bits.

Nov 27 '17 00:11 Alcaro