proposal-binary-ast icon indicating copy to clipboard operation
proposal-binary-ast copied to clipboard

How are the various "enum"s encoded?

Open dead-claudia opened this issue 7 years ago • 3 comments

They're listed as strings in the spec, but it would seem highly inefficient to encode them that way. Are they in fact encoded as strings? (If not, you could encode them as LEB128 integers.)

dead-claudia avatar May 24 '18 19:05 dead-claudia

We have several binary encodings and we're still experimenting and tweaking them to improve compression and parse speed. The one with which we've been measuring parse speed does encode them as strings, which are themselves encoded as indices in the string table, as LEB32 integers. The tokenizer itself is optimized to perform LEB32 lookups instead of string lookups, so that's still quite fast and reasonably easy to compress.

We're also experimenting with encoding them as special interfaces, in a variant of the format which uses predictions on interfaces to improve compression, and this seems to observably decrease the size of the file. We haven't checked the impact on decompression speed.

Yoric avatar May 25 '18 20:05 Yoric

@Yoric LEB32? (You mean 32-bit little-endian integers?)

dead-claudia avatar May 26 '18 05:05 dead-claudia

Indeed, that's what I meant.

Yoric avatar May 26 '18 06:05 Yoric