jsoniter-scala icon indicating copy to clipboard operation
jsoniter-scala copied to clipboard

leverage SIMD

Open LifeIsStrange opened this issue 4 years ago • 9 comments

SIMD allow revolutionary intra-core parallelism. Actually the fastest Json library on earth is called simdjson for this precise reason. openjdk 16 release next month and bring SIMD support to the JVM! https://openjdk.java.net/jeps/338 You could hence use it that way (or through the intriguing https://github.com/beehive-lab/TornadoVM )

exciting isn't it ? :) @plokhotnyuk

LifeIsStrange avatar Feb 21 '21 05:02 LifeIsStrange

I think that SIMD would work greatly only for payload with long strings.

A better option would be an adoption of SWAR techniques using 64-bit or 128-bit values, like here.

plokhotnyuk avatar Feb 21 '21 10:02 plokhotnyuk

Wow interesting!!

LifeIsStrange avatar Feb 21 '21 16:02 LifeIsStrange

@plokhotnyuk that looks super super cool.

I am a little naive in C, still, what is the general approach the author is approaching, from a Scala perspective (if this is possible to explain at all)?

I did not know there was something that could be even faster than simdjson!

ScalaWilliam avatar Jul 10 '21 15:07 ScalaWilliam

@ScalaWilliam Before comparing the speed need to understand that the result of simdjson parsing is just an iterator over indexed JSON input, and not a data structure with arbitrary access to fields (values or references that accessible by offsets) as it usually happening in the Scala world.

Sometime creation of the data structure in Scala takes more CPU cycles than parsing of included values from JSON. Most expensive are instances of immutable collections, boxed primitives and Option[_] wrappers.

plokhotnyuk avatar Jul 10 '21 16:07 plokhotnyuk

@plokhotnyuk thank you so much for your explanation - sorry I realised I did not say that intended to refer to yyjson. This library seems to extract everything out into immutable structures.

ScalaWilliam avatar Jul 10 '21 16:07 ScalaWilliam

@ScalaWilliam While yyjson allocates immutable nodes of the JSON object model the main CPU cycles will be spent during accessing to it when searching for tagged values that could be quite expensive for real world JSON messages.

Parsers from Scala world that provide object model for JSON usually use maps and vectors for JSON objects and JSON arrays accordingly that is much more expensive then parsing immediately to data structures and arrays.

plokhotnyuk avatar Jul 10 '21 16:07 plokhotnyuk

@LifeIsStrange @ScalaWilliam

Latest versions of jsoniter-scala-core for JVM use SWAR techniques for parsing and serialization of ASCII strings, booleans, numbers, java.time._ and java.util.UUID values.

It gives speed up to x2 for some cases.

Latest results are published here, as usually: https://plokhotnyuk.github.io/jsoniter-scala/

plokhotnyuk avatar Jul 07 '22 07:07 plokhotnyuk

@LifeIsStrange @ScalaWilliam Currently, jsoniter-scala seems to be quite competitive with simdjson-java when full validation against the JSON spec is not required for skipped keys and values: https://github.com/simdjson/simdjson-java/pull/2

plokhotnyuk avatar Jul 21 '23 13:07 plokhotnyuk