simdjson_nodejs
simdjson_nodejs copied to clipboard
Slower than JSON.parse
Hi @luizperes
I know this library was made to handle large JSON files, but there i occurred to some performance stranges when tried to parse my json and benchmarked, this library is slow, up to 6-x slower.
Here result:
json.parse - large: 985.011ms
simdjson - large: 4.756s
Also, lazyParse
does not return expected result for me or i'm doing something wrong, and even using lazyParse, performance still slow. How we can improve this?
Code to test
const simdJson = require("simdjson");
const bench = (name, fn) => {
console.time(name);
for (let i = 0; i < 200000; i++) {
fn();
}
console.timeEnd(name);
};
// Create large JSON file
let JSON_BUFF_LARGE = {};
for (let i = 0; i < 20; i++) { // 20 is close to 0.5Kb which very small, but you can increase this value
JSON_BUFF_LARGE["key_" + i] = Math.round(Math.random() * 1e16).toString(16);
}
JSON_BUFF_LARGE = JSON.stringify(JSON_BUFF_LARGE);
console.log(
"JSON buffer LARGE size is ",
parseFloat(JSON_BUFF_LARGE.length / 1024).toFixed(2),
"Kb"
);
bench("json.parse - large", () => JSON.parse(JSON_BUFF_LARGE));
bench("simdjson - large", () => simdJson.parse(JSON_BUFF_LARGE));
Tested on AVX supported device too
Hi @dalisoft,
simdjson.parse
is always slower as expected. Please take a look at issue #5 and the Documentation.md file.
Thank you so much for making your tests available to me so I could save some time. As you mentioned, simdjson
is not doing better than the standard JSON for your case. It could be that there is something related to parsing numbers (which is a little slow in simdjson, but it should still be faster). In your case you are generating random numbers and your json string JSON_BUFF_LARGE
could probably be difficult for parsing (in simdjson
), but it shouldn't be the case. I am speculating that it could be a problem with the wrapper (only if there is something very wrong) or some sort of bug on the upstream (explanation below).
I changed the parameters of the code you asked me to test. Instead of a 0.5Kb
file, I am using a +25kb
file. just change replace i < 20
with i < 200000
.
For all three functions of simdjson
, here is the output I get (my machine is an AVX2):
simdjson.parse
JSON buffer LARGE size is 25.78 Kb
json.parse - large: 33672.126ms
simdjson - large: 159626.570ms
simdjon.lazyParse
For this case, lazyParse
is faster (by around 30%) than the standard JSON.
JSON buffer LARGE size is 25.81 Kb
json.parse - large: 33321.596ms
simdjson - large: 21988.679ms
simjson.isValid
isValid
is nearly the same thing as lazyParse
, as lazyParse
only validates the json but does not construct the JS object, so they both should be running in around the same speed. I will check that to see if this is a problem in the wrapper (likely) or the upstream (by running it without the wrapper and getting its perf stat)
JSON buffer LARGE size is 25.80 Kb
json.parse - large: 33484.594ms
simdjson - large: 5665.534ms
One interesting thing that you will see with simdjson
is that it scales well and becomes much faster than regular state-machine-parsing algorithms. But as stated above, there is something wrong going on. I will only have time to check around the third week of April.
Thanks again for the contribution!
cc @lemire
Oh, here is the usage of lazyParse
:
const simdjson = require('simdjson');
const jsonString = "{ \"foo\": { \"bar\": [ 0, 42 ] } }";
const JSONbuffer = simdjson.lazyParse(jsonString); // external (C++) parsed JSON object
console.log(JSONbuffer.valueForKeyPath("foo.bar[1]")); // 42
See that it does not construct the actual JS object, it keeps an external pointer to the C++ buffer and for this reason, you can only access the keys with the valueForKeyPath
function that is returned in the object.
It could be that there is something related to parsing numbers (which is a little slow in simdjson, but it should still be faster)
It is generally a challenging task but simdjson should still be faster than the competition.
I know this library was made to handle large JSON file
The simdjson library itself is faster even on small files.
cc @croteaucarine @jkeiser
Is i'm doing something wrong or at cost of bindings performance isn't what i want.
@luizperes
const simdjson = require('simdjson');
const jsonString = "{ \"foo\": { \"bar\": [ 0, 42 ] } }";
const JSONbuffer = simdjson.lazyParse(jsonString); // external (C++) parsed JSON object
console.log(JSONbuffer.valueForKeyPath("foo.bar[1]")); // 42
I see it's good, but i'm using for other case
const simdjson = require('simdjson');
// some code
// all below code is repeated a lot of times
const JSONbuffer = simdjson.lazyParse(req.body); // req.body - JSON string
console.log(JSONbuffer.valueForKeyPath("")); // To get all object
I want use this library within my backend framework for node.js as JSON.parse
alternative for higher performance, but performance only slow downing
Hi @dalisoft, I see your point now. There are only a few cases where you actually need the whole json, but for the root case, it should have the same performance for this wrapper.
I will think of new approaches to improve the library but will only be able to do it in the future. I also will take a close look to the repo https://github.com/croteaucarine/simdjson_node_objectwrap @croteaucarine. She’s working on improvements for the wrapper. I will leave this issue open until we fix it. Hopefully it won’t take (that) long. Cheers!
@luizperes Thanks, i'll wait :)
Note to self: there are a few leads on PR #33
@luizperes did you consider the way node-expat has chosen?
@xamgore can you elaborate your question?
@luizperes Hi For better debugging you can try https://github.com/nanoexpress/json-parse-expirement
@luizperes with node-expat you can add js callbacks for events like "opening tag", "new attribute with name x", etc, so only the required properties are picked, copied and passed back to the javascript thread.
It's a contrary to the method of a smart proxy object, and still doesn't require a big amount of data to be passed between the addon and v8.
So just to confirm, if you want to get the entire object from a string(eg lazy usage isn't possible), this probably isn't the library to use in it's current state?
@RichardWright I cannot speak specifically for this library but one general concern is that constructing the full JSON representation in JavaScript, with all the objects, strings, arrays... is remarkably expensive. In some respect, that's independent from JSON.
cc @jkeiser
Passing around a buffer and using key access is the preferred method then?
That is correct @RichardWright. My idea, as mentioned in other threads, would be to have simdjson implemented as the native json parser directly into the engine e.g V8. That would possibly speed up the process. At this moment I am finishing my masters thesis and don't have time to try it, so we will have to wait a little bit on that. :)
CC @jkeiser @lemire
CC @mcollina
@luizperes cool, thanks for the response!