cheshire icon indicating copy to clipboard operation
cheshire copied to clipboard

Suggestion: prove equivalence with clojure.data.json

Open vemv opened this issue 3 years ago • 6 comments

Hi!

It'd be interesting to prove that serialization/deserialization is analog to that of clojure.data.json, so that existing applications can switch implementations without fearing that values will suddenly have a slightly different format, etc.

jsonista does a cheshire (and cheshire-only) comparison here:

https://github.com/metosin/jsonista/blob/211306f04bb15d7232b536cf6c6d8ecfeae0512d/test/jsonista/core_test.clj#L56

It could be desirable to something like that, but comparing data.json<->cheshire (obviously, transitively one could also make an educated data.json<->cheshire<->jsonista comparison).

I see that c.d.j is already exercised here https://github.com/dakrone/cheshire/blob/4525b23da1c17decba363202402a8a195d21705f/benchmarks/cheshire/test/benchmark.clj , so it might be easy enough to piggy back on that test, adding some extra assertions.

Thanks - V

vemv avatar Aug 10 '20 07:08 vemv

org.clojure/data.json 2.0.0 just came out with significant speed up. This was the announcement on Clojurians Slack:

This release introduces significant speed improvements in both reading and writing json, while still being a pure clojure lib with no external dependencies.
Using the benchmark data from jsonista we see the following improvement:
Reading:
10b from 1.4 µs to 609 ns (cheshire 995 ns)
100b from 4.6 µs to 2.4 µs (cheshire 1.9 µs)
1k from 26.2 µs to 13.3 µs (cheshire 10.2 µs)
10k from 292.6 µs to 157.3 µs (cheshire 93.1 µs)
100k from 2.8 ms to 1.5 ms (cheshire 918.2 µs)
Writing
10b from 2.3 µs to 590 ns (cheshire 1.0 µs)
100b from 7.3 µs to 2.7 µs (cheshire 2.5 µs)
1k from 41.3 µs to 14.3 µs (cheshire 9.4 µs)
10k from 508 µs to 161 µs (cheshire 105.3 µs)
100k from 4.4 ms to 1.5 ms (cheshire 1.17 ms)

Perhaps Cheshire can add more perf tweaks to always stay ahead of pure Clojure.

/cc @nilern

borkdude avatar Mar 19 '21 08:03 borkdude

Seems like there are some fixed costs that slow down small parses...

nilern avatar Mar 19 '21 09:03 nilern

From @slipset:

1. remove the dynamic vars and pass them explicitly as an options map
2. for reading, split reading strings into two paths, the quick one (without any escapes), you do with passing an array slice to (String.), the slow one (with escapes and unicode and stuff) you still do with Stringbuilder
3. for writing, don’t use format to construct unicode escapes

The main trick though was to use the stuff in  http://clojure-goes-fast.com
ie, profile, observe the results, form a hypothesis, create a fix 

borkdude avatar Mar 19 '21 10:03 borkdude

There seems to be a startup cost in using Jackson that jsonista seems to avoid. It might be that being able to maintain some sort of Jackson context in your app, and pass it to the various parse fns could speed things up quite a bit.

It seems though that most of the cost comes from assoc! which is hard to avoid without creating custom data types.

slipset avatar Mar 19 '21 10:03 slipset

It is a known issue that the 3 arity version of assoc and assoc! are faster than their varargs counterparts, so using multiple assoc! with 1 kvs instead of one assoc! for multiple kvs could speed things up (although I don't see that being used in cheshire).

borkdude avatar Mar 19 '21 10:03 borkdude

I would guess the cost is in (.createParser ^JsonFactory (or factory/*json-factory* factory/json-factory) ^Reader rdr).

I don't think the varargs assoc! applies. And if you want a standard map the assoc!ing has to be done eventually so... go optimize transients in core?

nilern avatar Mar 19 '21 11:03 nilern