libdart icon indicating copy to clipboard operation
libdart copied to clipboard

Request to support simd json

Open hrishikesh713 opened this issue 5 years ago • 8 comments

RapidJSON offers much more than just parsing, it helps you generate JSON and offers various other convenient functions. https://github.com/lemire/simdjson is not as convenient as RapidJSON but if we just have to parse the document simdjson seems to be a faster alternative.

hrishikesh713 avatar Mar 11 '20 05:03 hrishikesh713

Hey Hrishikesh! I've thought about adding simdjson integration in the past, my primary hesitation with it has just been that it's not actually a conformant json parser.

Cfretz244 avatar Mar 11 '20 20:03 Cfretz244

In my understanding, simdjson uses SIMD instructions to compute a bitmap of "structural elements" (characters like commas, curly braces, double quotes, etc) in the incoming string, which it then iterates over to infer the original JSON tree You get insane performance because the vector engines can chunk up the string and compute the bitmap in parallel, but the concept of "structural elements" isn't very well defined

Cfretz244 avatar Mar 11 '20 20:03 Cfretz244

It can definitely parse anything that is actually JSON, but it'll also parse a large set of documents that definitely are not JSON. The effective grammar is much looser

Cfretz244 avatar Mar 11 '20 20:03 Cfretz244

That being said, we could add it and just document the fact that it shouldn't be used if the input string isn't from a trusted source

Cfretz244 avatar Mar 11 '20 20:03 Cfretz244

The additional problem is that I'm not sure what kind of performance improvement we'll see, because after simdjson runs, Dart still needs to construct its representation of the document parsed by simdjson, which I'm assuming will be significantly slower. Testing that I've done on my personal laptop suggests that Dart is capable of serializing at around 600-700 MB/second, which will bottleneck the simdjson parsing performance which claims on the order of GB/second, but I'm definitely down to try it as an experimental thing and see what happens

Cfretz244 avatar Mar 11 '20 20:03 Cfretz244

Until then, if you install sajson and build with the -DDART_USE_SAJSON flag, you should see about a 2x improvement in parsing performance by just using sajson. sajson strikes a really nice balance between performance and conformance, and also happens to organize its parse tree in a way that's really complementary for the Dart lowering logic

Cfretz244 avatar Mar 11 '20 20:03 Cfretz244

Note that this switch to sajson will only significantly affect parsing performance when parsing in finalized mode So code like the following:

auto pkt = dart::packet::from_json(R"({"hello":"world"})", true);
auto buf = dart::buffer::from_json(R"({"hello":"world"})");

Would see the improvement, but code like:

auto pkt = dart::packet::from_json(R"({"hello":"world"})", false);
auto heap = dart::heap::from_json(R"({"hello":"world"})");

Will likely perform more like rapidjson because creating the mutable representation is so relatively expensive.

Cfretz244 avatar Mar 11 '20 20:03 Cfretz244

Just wanted to update here that I was apparently incorrect about the conformance bit with simdjson. I last looked at the project like 9 months ago and it's come quite a long way since then and now supports full document validation

Cfretz244 avatar Mar 12 '20 16:03 Cfretz244