meteor-feature-requests icon indicating copy to clipboard operation
meteor-feature-requests copied to clipboard

Accelerate JSON parsing with simdjson to reduce CPU usage.

Open vlasky opened this issue 5 years ago • 7 comments

simdjson is currently the "fastest JSON parser in the world". The recently released version 0.3 claims to achieve a parsing speed of over 3GB/s.

It is written in C++ and achieves its impressive speed by automatically leveraging the CPU's SIMD instructions and using microparallel algorithms.

There is a nodejs binding available.

I expect that incorporating this library and binding into Meteor would significantly reduce CPU usage inside the event loop and improve Meteor webapp performance.

I expect that most of the changes would need to be made in Meteor's EJSON package.

vlasky avatar Apr 02 '20 05:04 vlasky

Having taken a quick look at Meteor's source code, I see that JSON.parse() is called all over the place, so I expect the performance gains could be even greater.

Given that there is C++/V8 object marshalling overhead with simdjson, I expect that V8's native JSON.parse() code could outperform it when the JSON string is less than a certain length.

This length threshold could be determined through benchmarking. Meteor's code could then do conditional string length check to determine which one should be used for any given JSON string.

vlasky avatar Apr 02 '20 06:04 vlasky

Yes, native JSON.parse is pretty fast. I think this might be premature optimization.

mitar avatar Apr 03 '20 08:04 mitar

@mitar I don't think it's premature at all. In all these years Meteor has been around, it hasn't been previously considered. The event loop is the most important place to save CPU cycles and RAM.

When you have an app servicing lots of method/API calls and performing lots of database I/O, this all adds up very quickly.

Benchmarks conducted by third parties suggest that V8's JSON.parse() is vastly inferior to the best C++ implementations.

Here are some that I have found:

  1. https://github.com/GoogleChromeLabs/json-parse-benchmark

In this test, parsing an 8.2MB string literal took V8 approximately 14 seconds.

  1. https://github.com/miloyip/nativejson-benchmark

RapidJSON (the fastest JSON parser at the time, which is slower than simdjson) took 8ms to parse 4.5MB of sample JSON data, compared to 53ms for V8's.

Also, V8 consumed about 3.3x the RAM that RapidJSON did during the parse - 15.9MB vs 4.8MB.

vlasky avatar Apr 03 '20 09:04 vlasky

Sure, but were those C++ implementations accessible from node process? So the question is how fast things are once you embed them inside node and access them from there?

From my experience bottleneck is not JSON, but EJSON. Especially EJSON.clone which is being done a lot everywhere in Meteor code.

mitar avatar Apr 05 '20 06:04 mitar

Here are the published benchmarks for using simdjson within the node process via the binding simdjson_nodejs. To me, these results are conclusive.

https://github.com/luizperes/simdjson_nodejs#benchmarks

vlasky avatar Apr 05 '20 23:04 vlasky

Seems like there are some still potential performance issues depending on use case. But also looks like the maintainers are looking into addressing it.

https://github.com/luizperes/simdjson_nodejs/issues/28

hexsprite avatar Apr 06 '20 01:04 hexsprite

Encountered really big issues with EJSON.clone on client-side since we were relying on a parsing of JSON data in real-time and within JSON we had stored some data (like Uint8Array) which isn't allowed by default in JSON. Considering usage https://github.com/simdjson/simdjson as included within Meteor and, as a bonus, the possibility to use it from native process for electron application could be a good addition for Meteor.

linegel avatar May 14 '20 01:05 linegel