tree-sitter-julia icon indicating copy to clipboard operation
tree-sitter-julia copied to clipboard

Reduce size and compile time of the parser

Open maxbrunsfeld opened this issue 1 year ago • 1 comments

Fixes https://github.com/tree-sitter/tree-sitter-julia/issues/124 Depends on https://github.com/tree-sitter/tree-sitter/pull/3234

  • Generate simpler code for matching large character sets: fixed binary search function on a static array, instead of a generated binary search code.
  • In the grammar, avoid single tokens that match multiple keywords e.g. primitive type, abstract type, as these don't work with keyword extraction, making the main lex function more complex.
  • In the grammar, simplify the argument_list rule, replacing a long highly-specific sequence with a more generic repetitive structure (reduces the parser size by ~6500 states

On my M3, compiling the parser to aarch64 now takes about 4 seconds. Compiling to WASM takes 3 minutes, which is still terrible, but better than before.

/cc @savq

maxbrunsfeld avatar Mar 31 '24 17:03 maxbrunsfeld

  • In the grammar, avoid single tokens that match multiple keywords e.g. primitive type, abstract type, as these don't work with keyword extraction, making the main lex function more complex.
  • In the grammar, simplify the argument_list rule, replacing a long highly-specific sequence with a more generic repetitive structure (reduces the parser size by ~6500 states)

I didn't know either of those things 😬

@maxbrunsfeld, could we move the argument_list update to #135? Most of that PR is around removing the conflicts between signatures/parameters and calls/arguments.

savq avatar Mar 31 '24 19:03 savq

I'll merge this now. We can regenerate later with a tree-sitter version that includes https://github.com/tree-sitter/tree-sitter/pull/3234/

savq avatar Apr 11 '24 20:04 savq

Thanks, sorry for not responding @savq - https://github.com/tree-sitter/tree-sitter/pull/3234 is almost done.

maxbrunsfeld avatar Apr 12 '24 00:04 maxbrunsfeld

No problem.

I didn't update parser.h in the last generate, so the header still has the set_contains function.

savq avatar Apr 13 '24 18:04 savq