hermes icon indicating copy to clipboard operation
hermes copied to clipboard

Proof of concept: new JSON parser

Open radex opened this issue 1 year ago • 13 comments

Hi, I've been playing around for fun with the JSON parser, and made a new proof of concept implementation, which in my measurements is ~2x faster to the current implementation. Some more detail on it here: https://radex.io/react-native/json-parse/ , but the most important points are:

  • I've used simdjson as the underlying JSON parser (note: simdjson is already used successfully in React Native land by WatermelonDB)
  • I've replaced Unicode transcoding functions with simdutf (note: Node and Bun use simdutf for Unicode transcoding)
  • JSON.parse-specific fast paths for setting object properties and array items (not nicely extracted into an JSObject API in this PR, just turned some of its private methods to public so I could use them in the PoC), also SymbolID fast path

The question to Hermes team: is there any interest in possibly adding simdjson and simdutf as dependencies to Hermes? I can imagine that given the project's main goals, this might be a hard sell, as they are quite large. They added ~180KB (~4%) to bin/hermes in release mode, but TBH I haven't messed with build/linker settings to see if they're really that large, or if unused code isn't properly pruned.

Please treat this PR as just a proof of concept, the diff is full of junk and duplicated (rather than changed) code, for my own convenience while experimenting. If there is any interest in using any of this code, this would be submitted in multiple clean PRs.

radex avatar Mar 04 '23 12:03 radex

Wow Thanks for publishing that! I haven't even looked at the code yet, but I started looking at simdjson, and it appears to only support 64-bit CPUs https://github.com/simdjson/simdjson/blob/master/doc/implementation-selection.md . I am not sure why, but it states (perhaps incorrectly?) that even its fallback implementation requires a 64-bit CPU.

Since lots of ARM32 devices are running Hermes, it would be interesting to know whether it is supported and what the performance is there.

EDIT: I really enjoyed reading your blog post! We will start the internal discussion right away and will try to keep as much of it as possible public here in the comments.

tmikov avatar Mar 04 '23 17:03 tmikov

Thanks! This seems incorrect, I believe simdjson will use generic implementation fine on 32-bit https://github.com/simdjson/simdjson/blob/ec0b48b7722112e4fa005e557aa0531756c1718a/include/simdjson/portability.h#L89-L90 but I must admit that I haven't checked this, and I haven't benchmarked on Android.

EDIT: Okay, I think fallback and generic are two different things. fallback is a very simplified multi-arch SIMD implementation, and generic is a scalar implementation

radex avatar Mar 04 '23 20:03 radex

Some additional information...

  1. The simdutf library used in this project is part of bun and node.js.
  2. The simdjson runs 'everywhere': we even test on big-endian mainframes. IBM engineers contributed a POWER kernel. Intel engineers contributed an AVX-512 kernel. Evidently, the kernels matching recent commodity hardware are more performant.

lemire avatar Mar 06 '23 17:03 lemire

@lemire Hermes is intended primarily for mobile devices, many of which are still running underpowered ARM32 CPUs. Do you have an intuition of the performance there, or do you know of existing usages in a similar environment?

tmikov avatar Mar 06 '23 18:03 tmikov

@tmikov We have decent fallback performance in simdjson thanks to @jkeiser’s great engineering. Of course, our purpose is to milk the performance of higher performance 64-bit commodity cores.

lemire avatar Mar 06 '23 19:03 lemire

Hey @radex, thanks for this PR! I have just got around to reading your blogpost and slowly starting to poke around at some of the code. That was a great write-up, and the team is super appreciative of all the effort you put in to it!

There's a lot to respond to, but here are some of my initial thoughts.

V8's JSON.parse is still >2x faster than Hermes without pulling in an external library like simdjson. I think, as you note in your blogpost, there is probably a lot more room for optimization aside from the direct scanning. A very rich area to explore is the concept of speculation, as you say. V8 implements an iterative scanner, and interestingly they do not build their VM data structures in-place as opposed to Hermes. This allows them to do some things like allocating VM objects and arrays with the proper sizes. Here is a relevant code pointer if you are interested. Just to get a taste of some of the things V8 does, here is one cool optimization that I actually have a prototype diff for in Hermes. Basically, they check the 'next' transition on the current hidden class, and compare that to the string they just finished parsing. If it matches, no need to perform a hash computation+lookup into the identifier table.

I 100% agree with you on a fast-path API for setting properties on a JSON object. I also noticed you even added some missing LLVM_UNLIKELYs in our existing code, thanks for that!

If we did go the route of pulling in simdjson/some version of it, I wonder if the absolute best-performance would mean including some VM concepts inside of the parsing itself. For example: constructing the Symbol hash incrementally, which is an optimization I landed a couple weeks ago. Maybe we wouldn't have to write another separate pass over the data structure that simdjson produces.

I will continue reviewing the PR and start the discussion with the team of the best way of incorporating these ideas into Hermes. Thanks again, it has been very fun so far reading the blog and code :)

fbmal7 avatar Mar 06 '23 23:03 fbmal7

@fbmal7 Thanks for insight on V8 parser, very interesting!

radex avatar Mar 07 '23 08:03 radex

Hey @radex, sorry we have not been able to get back sooner. We have still been discussing this PR as a team. I have been looking through your code more recently and have some more thoughts to share. I think I would lean in on the opinion I shared in my initial reaction: simdjson / simdutf are not the best ways to improve our JSON parser. Potential perf gains aside, this is also a dependency the team would have to take on which would come with binary size increase and also maintenance overhead. All in all, I don't think we have the capacity right now to be able to import these libraries.

However, I do think your PR shows scattered inefficiencies across the parser and also how we deal with strings, specifically with regards to how we treat UTF16/UTF8 strings. I still really like your ideas on speculation and I absolutely think we should use fast paths for setting values on objects and arrays. I would be very interested to see how much faster the parser would be if all the non-simdjson changes were kept.

fbmal7 avatar Jun 02 '23 17:06 fbmal7

@fbmal7 Thanks for the answer! I suspected that this would be your decision, and if I were in Hermes team's shoes, I'd probably also be reluctant to take on simdjson as a dependency.

Regardless, I am also curious to see how the other perf improvements (and some ideas not fully fleshed out) will stack up in the future :)

radex avatar Jun 05 '23 07:06 radex

@radex have you considered publishing your work as an independent library (I have thought of a great name: fastjson.npm)?

My feeling is that simdjson and simdutf are amazing libraries that provide a real performance improvement, but they are targeted more towards server and desktop use cases. So making them the default for Hermes is not necessarily a good tradeoff, but it would be nice if people still had the option to choose it.

tmikov avatar Jun 05 '23 16:06 tmikov

@tmikov I have considered it, but I don't think I can achieve much of a perf win from simdjson via JSI, access to Hermes-internal API seems essential

radex avatar Jun 05 '23 19:06 radex

@radex good point. I assumed that it would still use the internal APIs, but I realize now that it is impossible in practice - they aren't exported. Hmm. I need to think more about cases like this.

tmikov avatar Jun 10 '23 18:06 tmikov

@radex some further thoughts that may be relevant in this comment: https://github.com/facebook/hermes/issues/684#issuecomment-1614268917

tmikov avatar Jun 30 '23 07:06 tmikov