JSON2.jl icon indicating copy to clipboard operation
JSON2.jl copied to clipboard

How does this compare to JSON.jl

Open stevengj opened this issue 7 years ago • 10 comments

It would be helpful if the README gave some hint as to how the two packages compare — when would you use one vs. the other?

stevengj avatar Feb 02 '18 18:02 stevengj

Fast JSON for Julia types

Hi @quinnj, can you quantify "Fast"?

samoconnor avatar Feb 04 '18 05:02 samoconnor

The main functionality JSON2 provides is native support for serializing/deserializing JSON and julia types (structs & mutable structs) using generated functions and reflection. For example, if I have

struct MyType
    a::Int
    b::String
end

and a web API that provides JSON like

{ "a": 1, "b": "string"}

then in JSON2, you can just do

JSON2.read(json, MyType)::MyType

and a MyType instance is directly constructed from the provided JSON. This also happens to be a very fast operation because the generated code for JSON2.read, looks roughly like

function JSON2.read(io::IO, ::Type{MyType})
    read(io, UInt8) # read '{'
    readkey(io) # read "a":
    a = read(io, Int)
    readdelim(io) # read ','
    readkey(io) # read "b":
    b = readstring(io)
    read(io, UInt8) # read '}'
    return MyType(a, b)
end

that's paraphrasing a bit, but you get the idea; that kind of straightline, type-stable code translates to very, very fast object parsing from JSON -> julia.

quinnj avatar Feb 05 '18 04:02 quinnj

How does it compare to JSON.jl for ordinary JSON read/written with Dict?

Should this package replace (or be merged with) JSON.jl? It just seems weird to have two packages for JSON.

stevengj avatar Feb 05 '18 13:02 stevengj

very, very fast object parsing

Can you post benchmarks comparing this to JSON.jl?

samoconnor avatar Feb 05 '18 13:02 samoconnor

JSON2 has rather different code for parsing floats from JSON.jl - how did that compare in performance?

ScottPJones avatar Feb 12 '18 13:02 ScottPJones

The master branch in JSON gets the following numbers:

julia> using JSON,BenchmarkTools ; for s in ("canada", "citm_catalog", "twitter", "citylots") ; str = readstring("/j/JSON.jl/data/$s.json") ; println("Test: $s") ; @btime JSON.parse($str) ; end
Test: canada
  77.823 ms (777633 allocations: 33.44 MiB)
Test: citm_catalog
  10.896 ms (168537 allocations: 11.07 MiB)
Test: twitter
  3.793 ms (33849 allocations: 2.48 MiB)
Test: citylots
  6.794 s (53286967 allocations: 2.40 GiB)

Using my optimizations (that I'll make a PR of in an hour or so):

Test: canada
  54.585 ms (670266 allocations: 24.26 MiB)
Test: citm_catalog
  8.604 ms (213296 allocations: 11.18 MiB)
Test: twitter
  3.660 ms (68353 allocations: 3.12 MiB)
Test: citylots
  5.863 s (60191476 allocations: 2.21 GiB)

ScottPJones avatar Feb 12 '18 19:02 ScottPJones

LazyJSON gets the following numbers:

Test: canada
  11.970 ns (1 allocation: 32 bytes)
Test: citm_catalog
  11.955 ns (1 allocation: 32 bytes)
Test: twitter
  11.962 ns (1 allocation: 32 bytes)
Test: citylots
  11.963 ns (1 allocation: 32 bytes)

But that is not a useful comparison by itself, because no output was produced.

The function below loads the coordinates for a citylots property into a DataFrame:

julia> j = String(read("citylots.json"));
julia> const J = LazyJSON
julia> function load_coords(j, n)
           d = DataFrame(x = Float64[], y = Float64[], z = Float64[])
           for x in J.parse(j)["features"][n]["geometry"]["coordinates"]
               for v in x
                   push!(d, v)
               end
           end
           return d
       end

julia> @time load_coords(j, 1)
  0.000080 seconds (128 allocations: 5.438 KiB)
5×3 DataFrame
│ Row │ x        │ y       │ z   │
├─────┼──────────┼─────────┼─────┤
│ 1   │ -122.422 │ 37.8085 │ 0.0 │
│ 2   │ -122.422 │ 37.8088 │ 0.0 │
│ 3   │ -122.421 │ 37.8088 │ 0.0 │
│ 4   │ -122.421 │ 37.8086 │ 0.0 │
│ 5   │ -122.422 │ 37.8085 │ 0.0 │

julia> @time load_coords(j, 206560)
  0.236713 seconds (217 allocations: 8.422 KiB)
11×3 DataFrame
│ Row │ x        │ y       │ z   │
├─────┼──────────┼─────────┼─────┤
│ 1   │ -122.424 │ 37.7829 │ 0.0 │
│ 2   │ -122.424 │ 37.783  │ 0.0 │
│ 3   │ -122.424 │ 37.783  │ 0.0 │
│ 4   │ -122.424 │ 37.7831 │ 0.0 │
│ 5   │ -122.423 │ 37.7832 │ 0.0 │
│ 6   │ -122.423 │ 37.7831 │ 0.0 │
│ 7   │ -122.423 │ 37.7831 │ 0.0 │
│ 8   │ -122.424 │ 37.783  │ 0.0 │
│ 9   │ -122.424 │ 37.783  │ 0.0 │
│ 10  │ -122.424 │ 37.7829 │ 0.0 │
│ 11  │ -122.424 │ 37.7829 │ 0.0 │

vs JSON.jl

julia> @time load_coords(j, 1)
  6.097472 seconds (52.32 M allocations: 1.738 GiB, 36.45% gc time)
5×3 DataFrame
...
julia> @time load_coords(j, 206560)
  6.564454 seconds (52.32 M allocations: 1.738 GiB, 38.58% gc time)
11×3 DataFrame
...

The next function loads all the coordinates for all the properties into a Dict of arrays keyed on "MAPBLKLOT":

function load_coords(j)
    d = Dict()
    for f in LazyJSON.parse(j)["features"]
        id = f["properties"]["MAPBLKLOT"]
        coords = Any[]
        d[id] = coords
        g = f["geometry"]
        if g != nothing && g["type"] == "Polygon"
            for x in g["coordinates"]
                for v in x
                    push!(coords, v)
                end
            end
        end
    end
    return d
end

julia> @time x = load_coords(j)
  3.894635 seconds (13.34 M allocations: 473.312 MiB, 29.88% gc time)
Dict{Any,Any} with 154216 entries:
...

vs JSON.jl

julia> @time x = load_coords(j)
  7.410909 seconds (56.90 M allocations: 1.921 GiB, 29.80% gc time)
Dict{Any,Any} with 154216 entries:
...

samoconnor avatar Feb 13 '18 03:02 samoconnor

Here is what I get with JSON2.jl:

julia> function load_coords(j, n)
                  d = DataFrame(x = Float64[], y = Float64[], z = Float64[])
                  for x in JSON2.read(j).features[n].geometry.coordinates
                      for v in x
                          push!(d, v)
                      end
                  end
                  return d
              end

julia> @time load_coords(j, 1)
  4.830097 seconds (31.05 M allocations: 1.514 GiB, 44.52% gc time)
5×3 DataFrame
│ Row │ x        │ y       │ z   │
├─────┼──────────┼─────────┼─────┤
│ 1   │ -122.422 │ 37.8085 │ 0.0 │
│ 2   │ -122.422 │ 37.8088 │ 0.0 │
│ 3   │ -122.421 │ 37.8088 │ 0.0 │
│ 4   │ -122.421 │ 37.8086 │ 0.0 │
│ 5   │ -122.422 │ 37.8085 │ 0.0 │

julia> @time load_coords(j, 206560)
  4.666455 seconds (31.05 M allocations: 1.514 GiB, 41.71% gc time)
11×3 DataFrame
│ Row │ x        │ y       │ z   │
├─────┼──────────┼─────────┼─────┤
│ 1   │ -122.424 │ 37.7829 │ 0.0 │
│ 2   │ -122.424 │ 37.783  │ 0.0 │
│ 3   │ -122.424 │ 37.783  │ 0.0 │
│ 4   │ -122.424 │ 37.7831 │ 0.0 │
│ 5   │ -122.423 │ 37.7832 │ 0.0 │
│ 6   │ -122.423 │ 37.7831 │ 0.0 │
│ 7   │ -122.423 │ 37.7831 │ 0.0 │
│ 8   │ -122.424 │ 37.783  │ 0.0 │
│ 9   │ -122.424 │ 37.783  │ 0.0 │
│ 10  │ -122.424 │ 37.7829 │ 0.0 │
│ 11  │ -122.424 │ 37.7829 │ 0.0 │

and

function load_coords(j)
           d = Dict()
           for f in JSON2.read(j).features
               id = f.properties.MAPBLKLOT
               coords = Any[]
               d[id] = coords
               g = f.geometry
               if g != nothing && g.type == "Polygon"
                   for x in g.coordinates
                       for v in x
                           push!(coords, v)
                       end
                   end
               end
           end
           return d
       end

julia> @time load_coords(j)
  8.354557 seconds (35.60 M allocations: 1.694 GiB, 51.34% gc time)
Dict{Any,Any} with 154216 entries:

samoconnor avatar Feb 14 '18 09:02 samoconnor

I've made a small test to compare JSON2.read against calling Base.convert with a LazyJSON object. JSON2 is about 20% faster, but LazyJSON handles unordered fields, and unexpected fields...

struct Point
    x::Int
    y::Int
end

struct Line
    a::Point
    b::Point
end

struct Arrow
    label::String
    segments::Vector{Line}
    dashed::Bool
end

json = """{
    "label": "Hello",
    "segments": [
        {"a": {"x": 1, "y": 1}, "b": {"x": 2, "y": 2}},
        {"a": {"x": 2, "y": 2}, "b": {"x": 3, "y": 3}}
    ],
    "dashed": false
}"""
julia> @time JSON2.read(json, Arrow)
  0.000019 seconds (40 allocations: 1.578 KiB)
Arrow("Hello", Line[Line(Point(1, 1), Point(2, 2)), Line(Point(2, 2), Point(3, 3))], false)

julia> @time convert(Arrow, LazyJSON.value(json))
  0.000023 seconds (44 allocations: 1.531 KiB)
Arrow("Hello", Line[Line(Point(1, 1), Point(2, 2)), Line(Point(2, 2), Point(3, 3))], false)

If the input fields are in the wrong order...

json = """{
    "dashed": false
    "label": "Hello",
    "segments": [
        {"a": {"x": 1, "y": 1}, "b": {"x": 2, "y": 2}},
        {"a": {"x": 2, "y": 2}, "b": {"x": 3, "y": 3}}
    ],
}"""
julia> @time JSON2.read(json, Arrow)
ERROR: ArgumentError: invalid JSON detected parsing type 'String': encountered 'f'

julia> @time convert(Arrow, LazyJSON.value(json))
  0.000024 seconds (44 allocations: 1.531 KiB)
Arrow("Hello", Line[Line(Point(1, 1), Point(2, 2)), Line(Point(2, 2), Point(3, 3))], false)

If there is an extra field...

json = """{
    "style": "bold",
    "label": "Hello",
    "segments": [
        {"a": {"x": 1, "y": 1}, "b": {"x": 2, "y": 2}},
        {"a": {"x": 2, "y": 2}, "b": {"x": 3, "y": 3}}
    ],
    "dashed": false
}"""
julia> @time JSON2.read(json, Arrow)
ERROR: ArgumentError: invalid JSON detected parsing type 'Array{Line,1}': encountered '"'

julia> @time convert(Arrow, LazyJSON.value(json))
  0.000023 seconds (44 allocations: 1.531 KiB)
Arrow("Hello", Line[Line(Point(1, 1), Point(2, 2)), Line(Point(2, 2), Point(3, 3))], false)

samoconnor avatar Feb 17 '18 12:02 samoconnor

JSON2 supports unordered fields, but you need to declare the appropriate JSON2.@format declaration, e.g.

 mutable struct Arrow
       label::String
       segments::Vector{Line}
       dashed::Bool
       Arrow() = new()
end
JSON2.@format Arrow noargs

where I get a timing like

julia> @time JSON2.read(json, Arrow)
  0.000026 seconds (46 allocations: 2.438 KiB)
Arrow("Hello", Line[Line(Point(1, 1), Point(2, 2)), Line(Point(2, 2), Point(3, 3))], false)

and for the extra field json

julia> @time JSON2.read(json, Arrow)
  0.000031 seconds (47 allocations: 2.469 KiB)
Arrow("Hello", Line[Line(Point(1, 1), Point(2, 2)), Line(Point(2, 2), Point(3, 3))], false)

The noargs approach requires Arrow to be a mutable struct though; alternatively, we could use the keywordargs formatter:

struct Arrow
    label::String
    segments::Vector{Line}
    dashed::Bool
    Arrow(; label::String="", segments::Vector{Line}=Line[], dashed::Bool=false, kwargs...) =
    new(label, segments, dashed)
end

JSON2.@format Arrow keywordargs

with the timings in order:

julia> @time JSON2.read(json, Arrow)
  0.000030 seconds (53 allocations: 2.313 KiB)
julia> @time JSON2.read(json, Arrow)
  0.000025 seconds (53 allocations: 2.313 KiB)
julia> @time JSON2.read(json, Arrow)
  0.000025 seconds (57 allocations: 2.453 KiB)

quinnj avatar Feb 17 '18 13:02 quinnj