Fix UTF-8 codepoint split by FormatterWriter
FormatterWriter has to deal with an inherent conflict: fmt::Formatter wants to write &str values (required to be valid UTF-8) while io::Write can be used to write arbitrary byte slices. This causes JSON helper to fail with certain non-ASCII strings.
SIMD-optimized JSON string formatting can cause writes that split UTF-8 codepoints, causing str::from_utf8() to fail since the input buffer for FormatterWriter has a chunk of UTF-8 codepoint at the end.
Consider the string from the new test:
"🤨🤨\n😮😮\n🤨🤨\n😮😮OMG"
which is encoded and processed like this:
F0 9F A4 A8 F0 9F A4 A8 0A F0 9F 98 AE F0 9F 98 AE 0A F0 9F A4 A8 F0 9F A4 A8 0A F0 9F 98 AE F0 9F 98 AE 4F 4D 47 UTF-8 string
| | | | | | | | | | | | | | | UTF-8 codepoint boundaries
----------- -----------
| | | i128 boundaries
In order to deal with it, FormatterWriter can write only the part which is valid UTF-8, keeping fmt::Formatter happy. io::Write allows partial write but its users have to be ready for that. Teach write_json_simd() to be ready by accounting how many bytes have been written, and if write_json_nosimd_prevalidated() doesn't write everything then the suffix gets written on the next call.
Benchmarks don't reveal any change in performance caused by that extra byte counting.