How does this compare to JSON.jl
It would be helpful if the README gave some hint as to how the two packages compare — when would you use one vs. the other?
Fast JSON for Julia types
Hi @quinnj, can you quantify "Fast"?
The main functionality JSON2 provides is native support for serializing/deserializing JSON and julia types (structs & mutable structs) using generated functions and reflection. For example, if I have
struct MyType
a::Int
b::String
end
and a web API that provides JSON like
{ "a": 1, "b": "string"}
then in JSON2, you can just do
JSON2.read(json, MyType)::MyType
and a MyType instance is directly constructed from the provided JSON. This also happens to be a very fast operation because the generated code for JSON2.read, looks roughly like
function JSON2.read(io::IO, ::Type{MyType})
read(io, UInt8) # read '{'
readkey(io) # read "a":
a = read(io, Int)
readdelim(io) # read ','
readkey(io) # read "b":
b = readstring(io)
read(io, UInt8) # read '}'
return MyType(a, b)
end
that's paraphrasing a bit, but you get the idea; that kind of straightline, type-stable code translates to very, very fast object parsing from JSON -> julia.
How does it compare to JSON.jl for ordinary JSON read/written with Dict?
Should this package replace (or be merged with) JSON.jl? It just seems weird to have two packages for JSON.
very, very fast object parsing
Can you post benchmarks comparing this to JSON.jl?
JSON2 has rather different code for parsing floats from JSON.jl - how did that compare in performance?
The master branch in JSON gets the following numbers:
julia> using JSON,BenchmarkTools ; for s in ("canada", "citm_catalog", "twitter", "citylots") ; str = readstring("/j/JSON.jl/data/$s.json") ; println("Test: $s") ; @btime JSON.parse($str) ; end
Test: canada
77.823 ms (777633 allocations: 33.44 MiB)
Test: citm_catalog
10.896 ms (168537 allocations: 11.07 MiB)
Test: twitter
3.793 ms (33849 allocations: 2.48 MiB)
Test: citylots
6.794 s (53286967 allocations: 2.40 GiB)
Using my optimizations (that I'll make a PR of in an hour or so):
Test: canada
54.585 ms (670266 allocations: 24.26 MiB)
Test: citm_catalog
8.604 ms (213296 allocations: 11.18 MiB)
Test: twitter
3.660 ms (68353 allocations: 3.12 MiB)
Test: citylots
5.863 s (60191476 allocations: 2.21 GiB)
LazyJSON gets the following numbers:
Test: canada
11.970 ns (1 allocation: 32 bytes)
Test: citm_catalog
11.955 ns (1 allocation: 32 bytes)
Test: twitter
11.962 ns (1 allocation: 32 bytes)
Test: citylots
11.963 ns (1 allocation: 32 bytes)
But that is not a useful comparison by itself, because no output was produced.
The function below loads the coordinates for a citylots property into a DataFrame:
julia> j = String(read("citylots.json"));
julia> const J = LazyJSON
julia> function load_coords(j, n)
d = DataFrame(x = Float64[], y = Float64[], z = Float64[])
for x in J.parse(j)["features"][n]["geometry"]["coordinates"]
for v in x
push!(d, v)
end
end
return d
end
julia> @time load_coords(j, 1)
0.000080 seconds (128 allocations: 5.438 KiB)
5×3 DataFrame
│ Row │ x │ y │ z │
├─────┼──────────┼─────────┼─────┤
│ 1 │ -122.422 │ 37.8085 │ 0.0 │
│ 2 │ -122.422 │ 37.8088 │ 0.0 │
│ 3 │ -122.421 │ 37.8088 │ 0.0 │
│ 4 │ -122.421 │ 37.8086 │ 0.0 │
│ 5 │ -122.422 │ 37.8085 │ 0.0 │
julia> @time load_coords(j, 206560)
0.236713 seconds (217 allocations: 8.422 KiB)
11×3 DataFrame
│ Row │ x │ y │ z │
├─────┼──────────┼─────────┼─────┤
│ 1 │ -122.424 │ 37.7829 │ 0.0 │
│ 2 │ -122.424 │ 37.783 │ 0.0 │
│ 3 │ -122.424 │ 37.783 │ 0.0 │
│ 4 │ -122.424 │ 37.7831 │ 0.0 │
│ 5 │ -122.423 │ 37.7832 │ 0.0 │
│ 6 │ -122.423 │ 37.7831 │ 0.0 │
│ 7 │ -122.423 │ 37.7831 │ 0.0 │
│ 8 │ -122.424 │ 37.783 │ 0.0 │
│ 9 │ -122.424 │ 37.783 │ 0.0 │
│ 10 │ -122.424 │ 37.7829 │ 0.0 │
│ 11 │ -122.424 │ 37.7829 │ 0.0 │
vs JSON.jl
julia> @time load_coords(j, 1)
6.097472 seconds (52.32 M allocations: 1.738 GiB, 36.45% gc time)
5×3 DataFrame
...
julia> @time load_coords(j, 206560)
6.564454 seconds (52.32 M allocations: 1.738 GiB, 38.58% gc time)
11×3 DataFrame
...
The next function loads all the coordinates for all the properties into a Dict of arrays keyed on "MAPBLKLOT":
function load_coords(j)
d = Dict()
for f in LazyJSON.parse(j)["features"]
id = f["properties"]["MAPBLKLOT"]
coords = Any[]
d[id] = coords
g = f["geometry"]
if g != nothing && g["type"] == "Polygon"
for x in g["coordinates"]
for v in x
push!(coords, v)
end
end
end
end
return d
end
julia> @time x = load_coords(j)
3.894635 seconds (13.34 M allocations: 473.312 MiB, 29.88% gc time)
Dict{Any,Any} with 154216 entries:
...
vs JSON.jl
julia> @time x = load_coords(j)
7.410909 seconds (56.90 M allocations: 1.921 GiB, 29.80% gc time)
Dict{Any,Any} with 154216 entries:
...
Here is what I get with JSON2.jl:
julia> function load_coords(j, n)
d = DataFrame(x = Float64[], y = Float64[], z = Float64[])
for x in JSON2.read(j).features[n].geometry.coordinates
for v in x
push!(d, v)
end
end
return d
end
julia> @time load_coords(j, 1)
4.830097 seconds (31.05 M allocations: 1.514 GiB, 44.52% gc time)
5×3 DataFrame
│ Row │ x │ y │ z │
├─────┼──────────┼─────────┼─────┤
│ 1 │ -122.422 │ 37.8085 │ 0.0 │
│ 2 │ -122.422 │ 37.8088 │ 0.0 │
│ 3 │ -122.421 │ 37.8088 │ 0.0 │
│ 4 │ -122.421 │ 37.8086 │ 0.0 │
│ 5 │ -122.422 │ 37.8085 │ 0.0 │
julia> @time load_coords(j, 206560)
4.666455 seconds (31.05 M allocations: 1.514 GiB, 41.71% gc time)
11×3 DataFrame
│ Row │ x │ y │ z │
├─────┼──────────┼─────────┼─────┤
│ 1 │ -122.424 │ 37.7829 │ 0.0 │
│ 2 │ -122.424 │ 37.783 │ 0.0 │
│ 3 │ -122.424 │ 37.783 │ 0.0 │
│ 4 │ -122.424 │ 37.7831 │ 0.0 │
│ 5 │ -122.423 │ 37.7832 │ 0.0 │
│ 6 │ -122.423 │ 37.7831 │ 0.0 │
│ 7 │ -122.423 │ 37.7831 │ 0.0 │
│ 8 │ -122.424 │ 37.783 │ 0.0 │
│ 9 │ -122.424 │ 37.783 │ 0.0 │
│ 10 │ -122.424 │ 37.7829 │ 0.0 │
│ 11 │ -122.424 │ 37.7829 │ 0.0 │
and
function load_coords(j)
d = Dict()
for f in JSON2.read(j).features
id = f.properties.MAPBLKLOT
coords = Any[]
d[id] = coords
g = f.geometry
if g != nothing && g.type == "Polygon"
for x in g.coordinates
for v in x
push!(coords, v)
end
end
end
end
return d
end
julia> @time load_coords(j)
8.354557 seconds (35.60 M allocations: 1.694 GiB, 51.34% gc time)
Dict{Any,Any} with 154216 entries:
I've made a small test to compare JSON2.read against calling Base.convert with a LazyJSON object. JSON2 is about 20% faster, but LazyJSON handles unordered fields, and unexpected fields...
struct Point
x::Int
y::Int
end
struct Line
a::Point
b::Point
end
struct Arrow
label::String
segments::Vector{Line}
dashed::Bool
end
json = """{
"label": "Hello",
"segments": [
{"a": {"x": 1, "y": 1}, "b": {"x": 2, "y": 2}},
{"a": {"x": 2, "y": 2}, "b": {"x": 3, "y": 3}}
],
"dashed": false
}"""
julia> @time JSON2.read(json, Arrow)
0.000019 seconds (40 allocations: 1.578 KiB)
Arrow("Hello", Line[Line(Point(1, 1), Point(2, 2)), Line(Point(2, 2), Point(3, 3))], false)
julia> @time convert(Arrow, LazyJSON.value(json))
0.000023 seconds (44 allocations: 1.531 KiB)
Arrow("Hello", Line[Line(Point(1, 1), Point(2, 2)), Line(Point(2, 2), Point(3, 3))], false)
If the input fields are in the wrong order...
json = """{
"dashed": false
"label": "Hello",
"segments": [
{"a": {"x": 1, "y": 1}, "b": {"x": 2, "y": 2}},
{"a": {"x": 2, "y": 2}, "b": {"x": 3, "y": 3}}
],
}"""
julia> @time JSON2.read(json, Arrow)
ERROR: ArgumentError: invalid JSON detected parsing type 'String': encountered 'f'
julia> @time convert(Arrow, LazyJSON.value(json))
0.000024 seconds (44 allocations: 1.531 KiB)
Arrow("Hello", Line[Line(Point(1, 1), Point(2, 2)), Line(Point(2, 2), Point(3, 3))], false)
If there is an extra field...
json = """{
"style": "bold",
"label": "Hello",
"segments": [
{"a": {"x": 1, "y": 1}, "b": {"x": 2, "y": 2}},
{"a": {"x": 2, "y": 2}, "b": {"x": 3, "y": 3}}
],
"dashed": false
}"""
julia> @time JSON2.read(json, Arrow)
ERROR: ArgumentError: invalid JSON detected parsing type 'Array{Line,1}': encountered '"'
julia> @time convert(Arrow, LazyJSON.value(json))
0.000023 seconds (44 allocations: 1.531 KiB)
Arrow("Hello", Line[Line(Point(1, 1), Point(2, 2)), Line(Point(2, 2), Point(3, 3))], false)
JSON2 supports unordered fields, but you need to declare the appropriate JSON2.@format declaration, e.g.
mutable struct Arrow
label::String
segments::Vector{Line}
dashed::Bool
Arrow() = new()
end
JSON2.@format Arrow noargs
where I get a timing like
julia> @time JSON2.read(json, Arrow)
0.000026 seconds (46 allocations: 2.438 KiB)
Arrow("Hello", Line[Line(Point(1, 1), Point(2, 2)), Line(Point(2, 2), Point(3, 3))], false)
and for the extra field json
julia> @time JSON2.read(json, Arrow)
0.000031 seconds (47 allocations: 2.469 KiB)
Arrow("Hello", Line[Line(Point(1, 1), Point(2, 2)), Line(Point(2, 2), Point(3, 3))], false)
The noargs approach requires Arrow to be a mutable struct though; alternatively, we could use the keywordargs formatter:
struct Arrow
label::String
segments::Vector{Line}
dashed::Bool
Arrow(; label::String="", segments::Vector{Line}=Line[], dashed::Bool=false, kwargs...) =
new(label, segments, dashed)
end
JSON2.@format Arrow keywordargs
with the timings in order:
julia> @time JSON2.read(json, Arrow)
0.000030 seconds (53 allocations: 2.313 KiB)
julia> @time JSON2.read(json, Arrow)
0.000025 seconds (53 allocations: 2.313 KiB)
julia> @time JSON2.read(json, Arrow)
0.000025 seconds (57 allocations: 2.453 KiB)