Streaming enhancements for dumping
While investigating workarounds for https://github.com/jruby/jruby/issues/6265 I realized that all dumping for e.g. to_json is done first to an in-memory buffer (always a Ruby String) even when given an IO object to which the json should be written. This applies to all three implementations: the pure-Ruby version, the C version, and the Java version.
This could obviously be more efficient if the json appends were writes directly to the given IO, or if it were possible to provide a String-like object that receives the appends. A rework of the generator subsystem would be necessary to pass any provided IO or String-like through the various dump methods.
This would have several benefits:
- No intermediate String to hold the entirety of the dumped json.
- No intermediate Strings for components of a dumped collection; Array and Hash currently dump each element or pair to a separate String and then append that String to the result buffer.
- Reduced allocation, copying, and GC overhead when dumping directly to IO.
- Potential to provide IO-like or String-like receivers of the dumped json, allowing for a workaround to the Java 2GB array limitation (https://github.com/jruby/jruby/issues/6265).
I'm hoping to attempt this for at least the Java and Ruby versions of the generator, but I may need help making the same change in the C extension. If others are interested in helping with any of these implementations, it would be greatly appreciated.