JSON.jl icon indicating copy to clipboard operation
JSON.jl copied to clipboard

`JSON.parse(read(io, String))` is faster than `JSON.parse(io)`

Open fonsp opened this issue 3 years ago • 1 comments

Based on my small test, it looks like JSON.parse(io) is 2x slower, and has 2x more allocations than first reading the IO into memory, and then calling JSON.parse on the resulting String.

Benchmark

using BenchmarkTools

# sample data
const sample = read(download("https://registry.npmjs.org/react"), String)

j1(io) = JSON.parse(read(io, String))
j2(io) = JSON.parse(io)

@benchmark let
	io = IOBuffer()
	write(io, $sample)
	seekstart(io)

	r = j1(io) # change to j1 or j2

	close(io)
	r
end

Results:

JSON.parse(read(io, String))

BenchmarkTools.Trial: 118 samples with 1 evaluation.
 Range (min … max):  38.245 ms … 60.927 ms  ┊ GC (min … max): 0.00% … 26.18%
 Time  (median):     39.169 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   42.542 ms ±  5.252 ms  ┊ GC (mean ± σ):  8.21% ±  9.87%

  ▅█▁                                                          
  ███▅▅▅▃▃▁▁▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▄▄▃▃▃▃▄▁▃▄▃▅▃▄▃▃▁▃▁▁▁▁▁▁▁▁▁▁▁▃▁▁▃ ▃
  38.2 ms         Histogram: frequency by time        56.1 ms <

 Memory estimate: 22.80 MiB, allocs estimate: 274058.

JSON.parse(io)

BenchmarkTools.Trial: 69 samples with 1 evaluation.
 Range (min … max):  66.057 ms … 85.004 ms  ┊ GC (min … max): 0.00% … 21.08%
 Time  (median):     67.862 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   72.698 ms ±  6.664 ms  ┊ GC (mean ± σ):  8.17% ±  8.06%

  █▃▃                           ▁                              
  ████▃▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄▆▄█▃▃▁▁▁▁▁▃▁▁▁▁▁▁▁▁▄▇▆▃▇▃▁▁▁▃▁▃ ▁
  66.1 ms         Histogram: frequency by time        84.5 ms <

 Memory estimate: 39.45 MiB, allocs estimate: 458500.

Based on this benchmark, it looks like an easy performance improvement to JSON.jl will be to change this to just read into memory and use the String method, i.e.:

parse(io::IO; kwargs...) = parse(read(io, String); kwargs...)

fonsp avatar Mar 08 '22 20:03 fonsp

One consideration is that we probably can't assume that calling read(io, String) is always safe; i.e. it might be a multi-gigabyte file that would completely fill up memory. Or perhaps the io stream is only partially JSON, but might be followed by some other kind of format, so it wouldn't make sense to read the entire IO and assume 100% json.

quinnj avatar Mar 10 '22 21:03 quinnj