simdjson_nodejs icon indicating copy to clipboard operation
simdjson_nodejs copied to clipboard

Expose the underlying tapes as ArrayBuffer

Open nojvek opened this issue 4 years ago • 5 comments

Reading the code I see the fast part of simdjson is parsing the bytes of json and creating two buffers/tapes. One is the json tape that marks starting, ending and types for various elements. The other is a string tape that contains the parsed strings in utf-8 format.

JavaScript offers a nice way of fast buffer indexing and getting our values via TypedArrays and ArrayBuffers. https://developer.mozilla.org/en-US/docs/Web/JavaScript/Typed_arrays

This would mean that the iteration part of getting values out could be done in pure js. I.e it would be technically possible to stream the buffers as binary data to the browser and have the iteration of json part work there too.

Or one could dump the tapes as files and get zero cost parsing by simply mmaping a file and iterating over gigabyes of json tape like Flat buffers.

https://google.github.io/flatbuffers/

I also don’t think lazyParse as the only function is a great interface. Underlying simdjson has a concept of elements and iterators. JavaScript has similar concept of iterators too. One would need to resort to proxy hacks which are a bit too magical and sometimes. I think we can expose a much nicer object/array iterator based interface for underlying tape.

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Iterators_and_Generators

https://codeburst.io/a-simple-guide-to-es6-iterators-in-javascript-with-examples-189d052c3d8e

This would mean there’s be two sub modules. One that is a fast jsonStr -> {jsonTape, strsTape}

The other that takes {jsonTape, strsTape} -> elemIterator.

Hopefully I’m making sense.

I’m happy to write the js part of the code. Just need to figure out how to export the buffers using napi api.

nojvek avatar Apr 16 '20 12:04 nojvek