proposal-binary-ast
proposal-binary-ast copied to clipboard
What do deferred early errors mean for JSON?
This might be beyond the scope of the current proposal, but I expect that we'll want to apply binary parsing also for JSON data.
For the specific case of JSON, I suspect that we want the data to be checked eagerly. Does this mean that we want to guarantee that some subsets of EcmaScript will be checked eagerly? That we want to be able to be able to specify that some files need to be parsed eagerly?
I expect that we'll want to apply binary parsing also for JSON data.
I don't know if we would, actually. ES doesn't spec JSON (except in that it provides JSON.parse
and JSON.stringify
), and even if it did I don't think there would be much of a parsing speed or file size advantage to a binary packing of a JSON object.
I'm planning to experiment with this once I have advanced sufficiently the updated SpiderMonkey-based prototype. My intuition is that we could have something that is both faster and smaller, at least for very large JSON files.
Something like BSON?
If/when this expands to JSON, we should probably look at BSON and other related formats to build off of.
There are two things that seem to be less than ideal about BSON, though. Firstly, it uses termination delimiters, not length-prefixing, to indicate the span of an entity's encoding. Termination delimiters is great for things you're scanning linearly, but terrible for quickly getting at some piece of data. It's also harder on the lexer since the lexer doesn't know the end of the entity it's parsing, and you have to constantly check for premature END bytes.
Secondly, @Yoric found when prototyping binjs that a constant name table for identifiers helped greatly with the size of the content (even after compression). It was the constant table that, if I remember correctly, helped us get below text JS for post-compression-size. JSON object fields tend to be terribly redundant, and constantizing all the field names at the top would likely be a big win for size-after-compression.
I'm particularly interested in the work on Protocol Buffers, Thrift, and friends, to see if we can lift any performance-driven designs for a JSON solution.
Just to clarify, I am not advocating extending this to JSON, just mentioning that:
- I'll run a few experiments to see how well it works;
- if it's there, someone is bound to use it for JSON eventually, and we need to have at least a rough idea of whether we are opening a can of worms.
JSON should be a very aspirational thought at this point of the proposal.
I don't understand how JSON would be related to this; I thought the idea was that you delay early errors for functions, right? JSON doesn't have functions.
True, if we agree to specify that early errors are delayed only for functions, there shouldn't be any problem with JSON.