Serialization of delimiters is surprisingly expensive
Serialization of JSON does a lot of tiny writes, especially for delimiters. As far as I can tell, every Write impl (except for that on slices of bytes) contains a constant-time state update which is never optimized away. For example, writing a string to a Vec produces 3 Write::write_all calls, checking the capacity and adjusting the length of the Vec 3 times.
On the json-benchmark twitter.json file when stringifying structs it looks like ~22.6% of runtime is spent in Formatter::begin* and Formatter::end* calls. So my best guess is that there is ~20% on the table here.
I've coded up a prototype which addresses this for strings, and appears to be a ~9% improvement on the twitter.json stringify structs benchmark, but it's quite the hack: https://github.com/serde-rs/json/compare/master...saethlin:write-hack
I think the core problem is that all the serde_json APIs accept a Write, and I need functionality that isn't already available in that trait. The bincode crate gets around this for readers by providing a separate entrypoint that accepts a BincodeRead which only have a few but specialized impls. Would this crate need to add a new entrypoint to support this optimization?