parsing_json.php improvements
(Apologies if this isn't the correct place, I couldn't find anything better.)
Just found http://seriot.ch/parsing_json.php. Great writeup, it's surprising how something so seemingly simple can have so many ways to screw up. I found a few possible improvements:
i_string_iso_latin_1.json | ["
E9"] n_string_invalid_utf-8.json | ["FF"]
As of #30, both are i_.
["\uD800\uD800"] makes some parsers go nuts. R jsonlite yields ["\U00010000"], while Ruby parser yields ["
F0908080"]. I still don't get where this value comes from.
Overeager decoding of surrogate pairs. \uD800\uDC00 should yield \U00010000, I guess that one ignores the top 10 bits of the supposed surrogate-low? F0908080 is \U00010000 in UTF-8, again ignoring the top 10 bits.