antlr4 icon indicating copy to clipboard operation
antlr4 copied to clipboard

Why can't we compress the ATN?

Open p7r0x7 opened this issue 1 year ago • 3 comments
trafficstars

Depending on how simply its implemented, it could be incredibly beneficial. Personally, since I'm already using zstd in my compiler project, I wouldn't mind zstd, but a super simple compression implementation could work.

p7r0x7 avatar Nov 17 '24 17:11 p7r0x7

Should not be a 3rd party lib, to avoid forcing all targets to have that in their specific environment. Instead a simple RLE might make more sense, but it's unclear if compressing the serialized ATN has any significant impact (on code size or runtime speed).

Instead maybe a new serialization format might be the better choice? However, I don't think that will ever be considered in ANTLR4. Instead follow the ANTLRng project, where this might become a reality.

mike-lischke avatar Dec 10 '24 14:12 mike-lischke

Should not be a 3rd party lib, to avoid forcing all targets to have that in their specific environment. Instead a simple RLE might make more sense, but it's unclear if compressing the serialized ATN has any significant impact (on code size or runtime speed).

Instead maybe a new serialization format might be the better choice? However, I don't think that will ever be considered in ANTLR4. Instead follow the ANTLRng project, where this might become a reality.

I'm not sure whether every ATN actually needs to be fully unpacked, and whether it should even be done in the parser constructor. For our largest grammar sql/plsql, unpacking takes 0.2s in C#, but most of the grammar isn't even used for the parse. For the entire test suite of 379 files, it's only 66% of the rules that are used. You would think that for small tests the %-used is even less.

But is there a problem? Is this too much time or space required?

kaby76 avatar Dec 11 '24 02:12 kaby76

It could be optional, but zstd is brilliant and ubiquitous.

p7r0x7 avatar Dec 20 '24 23:12 p7r0x7