simdjson_nodejs icon indicating copy to clipboard operation
simdjson_nodejs copied to clipboard

Slower than JSON.parse

Open dalisoft opened this issue 4 years ago • 20 comments

Hi @luizperes

I know this library was made to handle large JSON files, but there i occurred to some performance stranges when tried to parse my json and benchmarked, this library is slow, up to 6-x slower.

Here result:

json.parse - large: 985.011ms
simdjson - large: 4.756s

Also, lazyParse does not return expected result for me or i'm doing something wrong, and even using lazyParse, performance still slow. How we can improve this?

Code to test

const simdJson = require("simdjson");

const bench = (name, fn) => {
  console.time(name);
  for (let i = 0; i < 200000; i++) {
    fn();
  }
  console.timeEnd(name);
};

// Create large JSON file
let JSON_BUFF_LARGE = {};
for (let i = 0; i < 20; i++) { // 20 is close to 0.5Kb which very small, but you can increase this value
  JSON_BUFF_LARGE["key_" + i] = Math.round(Math.random() * 1e16).toString(16);
}
JSON_BUFF_LARGE = JSON.stringify(JSON_BUFF_LARGE);

console.log(
  "JSON buffer LARGE size is ",
  parseFloat(JSON_BUFF_LARGE.length / 1024).toFixed(2),
  "Kb"
);

bench("json.parse - large", () => JSON.parse(JSON_BUFF_LARGE));
bench("simdjson - large", () => simdJson.parse(JSON_BUFF_LARGE));

dalisoft avatar Apr 03 '20 07:04 dalisoft

Tested on AVX supported device too

Benchmark

dalisoft avatar Apr 03 '20 08:04 dalisoft

Hi @dalisoft, simdjson.parse is always slower as expected. Please take a look at issue #5 and the Documentation.md file.

Thank you so much for making your tests available to me so I could save some time. As you mentioned, simdjson is not doing better than the standard JSON for your case. It could be that there is something related to parsing numbers (which is a little slow in simdjson, but it should still be faster). In your case you are generating random numbers and your json string JSON_BUFF_LARGE could probably be difficult for parsing (in simdjson), but it shouldn't be the case. I am speculating that it could be a problem with the wrapper (only if there is something very wrong) or some sort of bug on the upstream (explanation below).

I changed the parameters of the code you asked me to test. Instead of a 0.5Kb file, I am using a +25kb file. just change replace i < 20 with i < 200000.

For all three functions of simdjson, here is the output I get (my machine is an AVX2):

simdjson.parse
JSON buffer LARGE size is  25.78 Kb
json.parse - large: 33672.126ms
simdjson - large: 159626.570ms
simdjon.lazyParse

For this case, lazyParse is faster (by around 30%) than the standard JSON.

JSON buffer LARGE size is  25.81 Kb
json.parse - large: 33321.596ms
simdjson - large: 21988.679ms
simjson.isValid

isValid is nearly the same thing as lazyParse, as lazyParse only validates the json but does not construct the JS object, so they both should be running in around the same speed. I will check that to see if this is a problem in the wrapper (likely) or the upstream (by running it without the wrapper and getting its perf stat)

JSON buffer LARGE size is  25.80 Kb
json.parse - large: 33484.594ms
simdjson - large: 5665.534ms

One interesting thing that you will see with simdjson is that it scales well and becomes much faster than regular state-machine-parsing algorithms. But as stated above, there is something wrong going on. I will only have time to check around the third week of April.

Thanks again for the contribution!

cc @lemire

luizperes avatar Apr 04 '20 05:04 luizperes

Oh, here is the usage of lazyParse:

const simdjson = require('simdjson');

const jsonString = "{ \"foo\": { \"bar\": [ 0, 42 ] } }";
const JSONbuffer = simdjson.lazyParse(jsonString); // external (C++) parsed JSON object
console.log(JSONbuffer.valueForKeyPath("foo.bar[1]")); // 42

See that it does not construct the actual JS object, it keeps an external pointer to the C++ buffer and for this reason, you can only access the keys with the valueForKeyPath function that is returned in the object.

luizperes avatar Apr 04 '20 05:04 luizperes

It could be that there is something related to parsing numbers (which is a little slow in simdjson, but it should still be faster)

It is generally a challenging task but simdjson should still be faster than the competition.

lemire avatar Apr 04 '20 06:04 lemire

I know this library was made to handle large JSON file

The simdjson library itself is faster even on small files.

lemire avatar Apr 04 '20 06:04 lemire

cc @croteaucarine @jkeiser

lemire avatar Apr 04 '20 06:04 lemire

Is i'm doing something wrong or at cost of bindings performance isn't what i want.

@luizperes

const simdjson = require('simdjson');

const jsonString = "{ \"foo\": { \"bar\": [ 0, 42 ] } }";
const JSONbuffer = simdjson.lazyParse(jsonString); // external (C++) parsed JSON object
console.log(JSONbuffer.valueForKeyPath("foo.bar[1]")); // 42

I see it's good, but i'm using for other case

const simdjson = require('simdjson');

// some code
// all below code is repeated a lot of times
const JSONbuffer = simdjson.lazyParse(req.body); // req.body - JSON string
console.log(JSONbuffer.valueForKeyPath("")); // To get all object

I want use this library within my backend framework for node.js as JSON.parse alternative for higher performance, but performance only slow downing

dalisoft avatar Apr 04 '20 12:04 dalisoft

Hi @dalisoft, I see your point now. There are only a few cases where you actually need the whole json, but for the root case, it should have the same performance for this wrapper.

I will think of new approaches to improve the library but will only be able to do it in the future. I also will take a close look to the repo https://github.com/croteaucarine/simdjson_node_objectwrap @croteaucarine. She’s working on improvements for the wrapper. I will leave this issue open until we fix it. Hopefully it won’t take (that) long. Cheers!

luizperes avatar Apr 04 '20 18:04 luizperes

@luizperes Thanks, i'll wait :)

dalisoft avatar Apr 04 '20 19:04 dalisoft

Note to self: there are a few leads on PR #33

luizperes avatar Apr 10 '20 21:04 luizperes

@luizperes did you consider the way node-expat has chosen?

xamgore avatar Dec 15 '20 13:12 xamgore

@xamgore can you elaborate your question?

luizperes avatar Dec 18 '20 21:12 luizperes

@luizperes Hi For better debugging you can try https://github.com/nanoexpress/json-parse-expirement

dalisoft avatar Dec 20 '20 16:12 dalisoft

@luizperes with node-expat you can add js callbacks for events like "opening tag", "new attribute with name x", etc, so only the required properties are picked, copied and passed back to the javascript thread.

It's a contrary to the method of a smart proxy object, and still doesn't require a big amount of data to be passed between the addon and v8.

xamgore avatar Dec 22 '20 11:12 xamgore

So just to confirm, if you want to get the entire object from a string(eg lazy usage isn't possible), this probably isn't the library to use in it's current state?

RichardWright avatar Jan 28 '22 17:01 RichardWright

@RichardWright I cannot speak specifically for this library but one general concern is that constructing the full JSON representation in JavaScript, with all the objects, strings, arrays... is remarkably expensive. In some respect, that's independent from JSON.

cc @jkeiser

lemire avatar Jan 28 '22 17:01 lemire

Passing around a buffer and using key access is the preferred method then?

RichardWright avatar Jan 28 '22 17:01 RichardWright

That is correct @RichardWright. My idea, as mentioned in other threads, would be to have simdjson implemented as the native json parser directly into the engine e.g V8. That would possibly speed up the process. At this moment I am finishing my masters thesis and don't have time to try it, so we will have to wait a little bit on that. :)

CC @jkeiser @lemire

luizperes avatar Jan 28 '22 18:01 luizperes

CC @mcollina

Uzlopak avatar Jan 28 '22 18:01 Uzlopak

@luizperes cool, thanks for the response!

RichardWright avatar Jan 31 '22 17:01 RichardWright