Tracking issue: performance optimization
As mentioned in https://github.com/zed-industries/extensions/pull/2068#issuecomment-2675494003, current parser is fairly complicated, and needs to be optimized.
Status
State Count
> rg "#define.*STATE" src/parser.c
16:#define STATE_COUNT 7954 17:#define LARGE_STATE_COUNT 1749
Top Results of States for Rule
tree-sitter generate --report-states-for-rule -
| rule name | count |
|---|---|
| val_table | 1107 |
| val_range | 817 |
| expr_binary_parenthesized | 614 |
| shebang_repeat1 | 573 |
| _immediate_decimal | 474 |
| _val_range | 374 |
| val_table_repeat1 | 363 |
| _val_number_decimal | 362 |
| collection_type | 330 |
| _multiple_types_repeat1 | 272 |
| _val_range_with_end | 266 |
| expr_binary | 210 |
| collection_type_repeat1 | 189 |
| val_closure | 183 |
| unquoted | 174 |
| _unquoted_anonymous_prefix | 166 |
| _unquoted_with_expr | 165 |
| ctrl_if_parenthesized | 158 |
| _str_double_quotes | 156 |
References
https://github.com/tree-sitter/tree-sitter/wiki/Tips-and-Tricks-for-a-grammar-author#reducing-state-count
It may be worth tracking building with WASM. Last run for me was 6min 49sec.
It may be worth tracking building with WASM. Last run for me was
6min 49sec.
I got these error messages, do you know where's the problem?
▓ tree-sitter build --wasm
WARNING: image platform (linux/amd64) does not match the expected platform (linux/arm64)
emcc: error: '/emsdk/upstream/bin/clang -target wasm32-unknown-emscripten -fignore-exceptions -fPIC -mllvm -combiner-global-alias-analysis=false -mllvm -enable-emscripten-sjlj -mllvm -disable-lsr --sysroot=/emsdk/upstream/emscripten/cache/sysroot -DEMSCRIPTEN -Werror=implicit-function-declaration -Xclang -iwithsysroot/include/fakesdl -Xclang -iwithsysroot/include/compat -Os -fno-exceptions -fvisibility=hidden -I. parser.c -c -o /tmp/emscripten_temp_xv1s01kw/parser_1.o' failed (received SIGKILL (-9))
emcc command failed
Stack backtrace:
0: std::backtrace::Backtrace::create
1: anyhow::error::<impl anyhow::Error>::msg
2: tree_sitter_loader::Loader::compile_parser_to_wasm
3: tree_sitter_cli::wasm::compile_language_to_wasm
4: tree_sitter::main
5: std::sys::backtrace::__rust_begin_short_backtrace
6: std::rt::lang_start::{{closure}}
7: std::rt::lang_start_internal
8: _main
My guess is because of the warning amd64 vs arm64
For me, tree-sitter build -w is now at 5min 40sec after the latest PR. Great start!!! 🥳
My guess is because of the warning amd64 vs arm64
Oh, they look so similar, and I didn't notice the diff, lol.
After #187, tree-sitter build -w is 3min 27sec now with the latest PR. nice work!
After #189, tree-sitter build -w is 2min 7sec. wow!
After #189,
tree-sitter build -wis2min 7sec. wow!
Seems that the case statement count is indeed the very target to optimize.
After #205 :
tree-sitter generate --report-states-for-rule - e>| "name\tcount\n" ++ $in | detect columns | take 10 | to md
| name | count |
|---|---|
| val_range | 817 |
| expr_binary_parenthesized | 614 |
| _repeat_newline | 490 |
| _immediate_decimal | 430 |
| _val_range | 387 |
| _val_number_decimal | 315 |
| val_closure | 231 |
| expr_binary | 210 |
| _unquoted_anonymous_prefix | 170 |
| _str_double_quotes | 160 |
Case number of ts_lex: 3376
Latest numbers:
> rg "#define.*STATE" src/parser.c
16:#define STATE_COUNT 5237 17:#define LARGE_STATE_COUNT 1273
Case count of ts_lex
2192
Top Results of States for Rule
tree-sitter generate --report-states-for-rule - e>| "name\tcount\n" ++ $in | detect columns | take 10 | to md
| name | count |
|---|---|
| expr_binary_parenthesized | 614 |
| val_range | 593 |
| _repeat_newline | 483 |
| _val_range | 298 |
| _immediate_decimal | 275 |
| decl_def | 240 |
| _val_number_decimal | 238 |
| expr_binary | 210 |
| cmd_identifier | 202 |
| ctrl_if_parenthesized | 158 |