proposal-binary-ast icon indicating copy to clipboard operation
proposal-binary-ast copied to clipboard

What do deferred early errors mean for JSON?

Open Yoric opened this issue 7 years ago • 8 comments

This might be beyond the scope of the current proposal, but I expect that we'll want to apply binary parsing also for JSON data.

For the specific case of JSON, I suspect that we want the data to be checked eagerly. Does this mean that we want to guarantee that some subsets of EcmaScript will be checked eagerly? That we want to be able to be able to specify that some files need to be parsed eagerly?

Yoric avatar Jul 19 '17 13:07 Yoric

I expect that we'll want to apply binary parsing also for JSON data.

I don't know if we would, actually. ES doesn't spec JSON (except in that it provides JSON.parse and JSON.stringify), and even if it did I don't think there would be much of a parsing speed or file size advantage to a binary packing of a JSON object.

bakkot avatar Jul 19 '17 13:07 bakkot

I'm planning to experiment with this once I have advanced sufficiently the updated SpiderMonkey-based prototype. My intuition is that we could have something that is both faster and smaller, at least for very large JSON files.

Yoric avatar Jul 19 '17 13:07 Yoric

Something like BSON?

Becavalier avatar Jul 19 '17 14:07 Becavalier

If/when this expands to JSON, we should probably look at BSON and other related formats to build off of.

There are two things that seem to be less than ideal about BSON, though. Firstly, it uses termination delimiters, not length-prefixing, to indicate the span of an entity's encoding. Termination delimiters is great for things you're scanning linearly, but terrible for quickly getting at some piece of data. It's also harder on the lexer since the lexer doesn't know the end of the entity it's parsing, and you have to constantly check for premature END bytes.

Secondly, @Yoric found when prototyping binjs that a constant name table for identifiers helped greatly with the size of the content (even after compression). It was the constant table that, if I remember correctly, helped us get below text JS for post-compression-size. JSON object fields tend to be terribly redundant, and constantizing all the field names at the top would likely be a big win for size-after-compression.

I'm particularly interested in the work on Protocol Buffers, Thrift, and friends, to see if we can lift any performance-driven designs for a JSON solution.

kannanvijayan-zz avatar Jul 19 '17 18:07 kannanvijayan-zz

Just to clarify, I am not advocating extending this to JSON, just mentioning that:

  • I'll run a few experiments to see how well it works;
  • if it's there, someone is bound to use it for JSON eventually, and we need to have at least a rough idea of whether we are opening a can of worms.

Yoric avatar Jul 19 '17 19:07 Yoric

JSON should be a very aspirational thought at this point of the proposal.

syg avatar Jul 19 '17 22:07 syg

I don't understand how JSON would be related to this; I thought the idea was that you delay early errors for functions, right? JSON doesn't have functions.

littledan avatar Jul 20 '17 13:07 littledan

True, if we agree to specify that early errors are delayed only for functions, there shouldn't be any problem with JSON.

Yoric avatar Jul 20 '17 13:07 Yoric