tree-sitter-nu icon indicating copy to clipboard operation
tree-sitter-nu copied to clipboard

Tracking issue: performance optimization

Open blindFS opened this issue 10 months ago • 10 comments

As mentioned in https://github.com/zed-industries/extensions/pull/2068#issuecomment-2675494003, current parser is fairly complicated, and needs to be optimized.

Status

State Count

> rg "#define.*STATE" src/parser.c

16:#define STATE_COUNT 7954 17:#define LARGE_STATE_COUNT 1749

Top Results of States for Rule

tree-sitter generate --report-states-for-rule -
rule name count
val_table 1107
val_range 817
expr_binary_parenthesized 614
shebang_repeat1 573
_immediate_decimal 474
_val_range 374
val_table_repeat1 363
_val_number_decimal 362
collection_type 330
_multiple_types_repeat1 272
_val_range_with_end 266
expr_binary 210
collection_type_repeat1 189
val_closure 183
unquoted 174
_unquoted_anonymous_prefix 166
_unquoted_with_expr 165
ctrl_if_parenthesized 158
_str_double_quotes 156

References

https://github.com/tree-sitter/tree-sitter/wiki/Tips-and-Tricks-for-a-grammar-author#reducing-state-count

blindFS avatar Feb 22 '25 03:02 blindFS

It may be worth tracking building with WASM. Last run for me was 6min 49sec.

fdncred avatar Feb 22 '25 03:02 fdncred

It may be worth tracking building with WASM. Last run for me was 6min 49sec.

I got these error messages, do you know where's the problem?

▓ 󰊠 󰏫 tree-sitter build --wasm
WARNING: image platform (linux/amd64) does not match the expected platform (linux/arm64)
emcc: error: '/emsdk/upstream/bin/clang -target wasm32-unknown-emscripten -fignore-exceptions -fPIC -mllvm -combiner-global-alias-analysis=false -mllvm -enable-emscripten-sjlj -mllvm -disable-lsr --sysroot=/emsdk/upstream/emscripten/cache/sysroot -DEMSCRIPTEN -Werror=implicit-function-declaration -Xclang -iwithsysroot/include/fakesdl -Xclang -iwithsysroot/include/compat -Os -fno-exceptions -fvisibility=hidden -I. parser.c -c -o /tmp/emscripten_temp_xv1s01kw/parser_1.o' failed (received SIGKILL (-9))
emcc command failed

Stack backtrace:
   0: std::backtrace::Backtrace::create
   1: anyhow::error::<impl anyhow::Error>::msg
   2: tree_sitter_loader::Loader::compile_parser_to_wasm
   3: tree_sitter_cli::wasm::compile_language_to_wasm
   4: tree_sitter::main
   5: std::sys::backtrace::__rust_begin_short_backtrace
   6: std::rt::lang_start::{{closure}}
   7: std::rt::lang_start_internal
   8: _main

blindFS avatar Feb 22 '25 04:02 blindFS

My guess is because of the warning amd64 vs arm64

fdncred avatar Feb 22 '25 14:02 fdncred

For me, tree-sitter build -w is now at 5min 40sec after the latest PR. Great start!!! 🥳

fdncred avatar Feb 22 '25 14:02 fdncred

My guess is because of the warning amd64 vs arm64

Oh, they look so similar, and I didn't notice the diff, lol.

blindFS avatar Feb 22 '25 14:02 blindFS

After #187, tree-sitter build -w is 3min 27sec now with the latest PR. nice work!

fdncred avatar Feb 23 '25 14:02 fdncred

After #189, tree-sitter build -w is 2min 7sec. wow!

fdncred avatar Mar 01 '25 12:03 fdncred

After #189, tree-sitter build -w is 2min 7sec. wow!

Seems that the case statement count is indeed the very target to optimize.

blindFS avatar Mar 01 '25 12:03 blindFS

After #205 :

tree-sitter generate --report-states-for-rule - e>| "name\tcount\n" ++ $in | detect columns | take 10 | to md
name count
val_range 817
expr_binary_parenthesized 614
_repeat_newline 490
_immediate_decimal 430
_val_range 387
_val_number_decimal 315
val_closure 231
expr_binary 210
_unquoted_anonymous_prefix 170
_str_double_quotes 160

Case number of ts_lex: 3376

blindFS avatar Jul 02 '25 20:07 blindFS

Latest numbers:

> rg "#define.*STATE" src/parser.c

16:#define STATE_COUNT 5237 17:#define LARGE_STATE_COUNT 1273

Case count of ts_lex

2192

Top Results of States for Rule

tree-sitter generate --report-states-for-rule - e>| "name\tcount\n" ++ $in | detect columns | take 10 | to md
name count
expr_binary_parenthesized 614
val_range 593
_repeat_newline 483
_val_range 298
_immediate_decimal 275
decl_def 240
_val_number_decimal 238
expr_binary 210
cmd_identifier 202
ctrl_if_parenthesized 158

blindFS avatar Oct 18 '25 02:10 blindFS