cheshire
cheshire copied to clipboard
Suggestion: prove equivalence with clojure.data.json
Hi!
It'd be interesting to prove that serialization/deserialization is analog to that of clojure.data.json, so that existing applications can switch implementations without fearing that values will suddenly have a slightly different format, etc.
jsonista does a cheshire (and cheshire-only) comparison here:
https://github.com/metosin/jsonista/blob/211306f04bb15d7232b536cf6c6d8ecfeae0512d/test/jsonista/core_test.clj#L56
It could be desirable to something like that, but comparing data.json<->cheshire (obviously, transitively one could also make an educated data.json<->cheshire<->jsonista comparison).
I see that c.d.j is already exercised here https://github.com/dakrone/cheshire/blob/4525b23da1c17decba363202402a8a195d21705f/benchmarks/cheshire/test/benchmark.clj , so it might be easy enough to piggy back on that test, adding some extra assertions.
Thanks - V
org.clojure/data.json
2.0.0
just came out with significant speed up. This was the announcement on Clojurians Slack:
This release introduces significant speed improvements in both reading and writing json, while still being a pure clojure lib with no external dependencies.
Using the benchmark data from jsonista we see the following improvement:
Reading:
10b from 1.4 µs to 609 ns (cheshire 995 ns)
100b from 4.6 µs to 2.4 µs (cheshire 1.9 µs)
1k from 26.2 µs to 13.3 µs (cheshire 10.2 µs)
10k from 292.6 µs to 157.3 µs (cheshire 93.1 µs)
100k from 2.8 ms to 1.5 ms (cheshire 918.2 µs)
Writing
10b from 2.3 µs to 590 ns (cheshire 1.0 µs)
100b from 7.3 µs to 2.7 µs (cheshire 2.5 µs)
1k from 41.3 µs to 14.3 µs (cheshire 9.4 µs)
10k from 508 µs to 161 µs (cheshire 105.3 µs)
100k from 4.4 ms to 1.5 ms (cheshire 1.17 ms)
Perhaps Cheshire can add more perf tweaks to always stay ahead of pure Clojure.
/cc @nilern
Seems like there are some fixed costs that slow down small parses...
From @slipset:
1. remove the dynamic vars and pass them explicitly as an options map
2. for reading, split reading strings into two paths, the quick one (without any escapes), you do with passing an array slice to (String.), the slow one (with escapes and unicode and stuff) you still do with Stringbuilder
3. for writing, don’t use format to construct unicode escapes
The main trick though was to use the stuff in http://clojure-goes-fast.com
ie, profile, observe the results, form a hypothesis, create a fix
There seems to be a startup cost in using Jackson that jsonista seems to avoid. It might be that being able to maintain some sort of Jackson context in your app, and pass it to the various parse fns could speed things up quite a bit.
It seems though that most of the cost comes from assoc!
which is hard to avoid without creating custom data types.
It is a known issue that the 3 arity version of assoc
and assoc!
are faster than their varargs counterparts, so using multiple assoc!
with 1 kvs instead of one assoc!
for multiple kvs could speed things up (although I don't see that being used in cheshire).
I would guess the cost is in (.createParser ^JsonFactory (or factory/*json-factory* factory/json-factory) ^Reader rdr)
.
I don't think the varargs assoc!
applies. And if you want a standard map the assoc!
ing has to be done eventually so... go optimize transients in core?