json icon indicating copy to clipboard operation
json copied to clipboard

Streaming enhancements for dumping

Open headius opened this issue 2 years ago • 0 comments

While investigating workarounds for https://github.com/jruby/jruby/issues/6265 I realized that all dumping for e.g. to_json is done first to an in-memory buffer (always a Ruby String) even when given an IO object to which the json should be written. This applies to all three implementations: the pure-Ruby version, the C version, and the Java version.

This could obviously be more efficient if the json appends were writes directly to the given IO, or if it were possible to provide a String-like object that receives the appends. A rework of the generator subsystem would be necessary to pass any provided IO or String-like through the various dump methods.

This would have several benefits:

  • No intermediate String to hold the entirety of the dumped json.
  • No intermediate Strings for components of a dumped collection; Array and Hash currently dump each element or pair to a separate String and then append that String to the result buffer.
  • Reduced allocation, copying, and GC overhead when dumping directly to IO.
  • Potential to provide IO-like or String-like receivers of the dumped json, allowing for a workaround to the Java 2GB array limitation (https://github.com/jruby/jruby/issues/6265).

I'm hoping to attempt this for at least the Java and Ruby versions of the generator, but I may need help making the same change in the C extension. If others are interested in helping with any of these implementations, it would be greatly appreciated.

headius avatar Mar 31 '23 20:03 headius