tree-sitter-julia
tree-sitter-julia copied to clipboard
Failed to build wasm
I have built many times,I have succeeded a few times and also failed many times But every time it takes up a lot of memory, the computer's 64GB of memory is all occupied, and then it crashes
I use npx tree-sitter build-wasm xxx
to build branch: master
May I ask how to provide more detailed information to help troubleshoot issues?
Tree-sitter-julia does require memory than other grammars, but this seems excessive. I'll try to troubleshoot and if I find the issue I'll add wasm to CI to avoid regressions.
@savq Do you have any idea how to improve the situation? There is some interest in making a Julia extension for the Zed editor here. It looks like we need a machine with 128 GB RAM just to make the julia.wasm
.
I read that Julia's use of unicode characters as operators may require more memory. I tried to remove most of the operators but that didn't help. What else could we try?
I read that Julia's use of unicode characters as operators may require more memory. I tried to remove most of the operators but that didn't help. What else could we try?
The operators aren't really a problem. The biggest problem with julia syntax are sequences of expressions without any delimiter: macros, matrices and juxtapositions (sort of).
I made an experiment in this branch, refactoring some stuff and removing those problematic forms entirely. The resulting parser.c
is still around 33M, and the wasm build takes around 10 minutes in my machine, but it could be a starting point for the zed extension.
I did some work on tree-sitter to improve this to some degree - I haven’t tested it in the Julia grammar.
https://github.com/tree-sitter/tree-sitter/pull/3234
The operators aren't really a problem.
@savq, I tossed things around again and now I'm pretty sure that the number of operators is at least one of the problems. I missed the tree-sitter generate
step 2 weeks ago, sorry.
Today I removed all but ≈ 50 operators. RAM usage went down from 60 GB to 4 GB with tree-sitter v0.20.8. Combined with your "smallish" branch, parser.c
is 20 MB. That looks much better!
Unfortunately, I guess Zed requires a newer tree-sitter version (I see errors in Zed's log). Trying to build the WASM (tree-sitter build -w
) with the latest tree-sitter v0.22.2 gives me the following error message:
This external scanner uses a symbol that isn't available to wasm parsers.
Missing symbols:
exit
Available symbols:
calloc
free
iswalnum
iswalpha
iswblank
iswdigit
iswlower
iswspace
iswupper
iswxdigit
malloc
memchr
memcmp
memcpy
memmove
memset
realloc
strcmp
strlen
strncpy
towlower
towupper
What does that mean? The only exit
s I can find:
> grep -R exit
./src/scanner.c: if (arr == NULL) exit(1);
./src/scanner.c: if (stack == NULL) exit(1);
./src/scanner.c: if (stack->len >= TREE_SITTER_SERIALIZATION_BUFFER_SIZE) exit(1);
./src/scanner.c: if (stack->len == 0) exit(1);
Exactly what it says on the tin: the exit
function from libc is not available in wasm. #136 fixes this (among other things) by using the built-in abort
instead.
@clason: Thank you, I'd assumed scanner.c was auto-generated. I replaced the four instances of exit(0)
with abort()
and it built.
This is my setup:
- clone the current tree-sitter-julia master
- remove most operators
- edit scanner.c as above
- use tree-sitter v0.22.2
-
tree-sitter generate
consumes 3 GB RAM and takes 2 mins on my aging MacBook -
tree-sitter build -w
consumes 4 GB RAM and takes 2 mins - parser.c is 54 MB and julia.wasm is 5.8 MB
-
tree-sitter test
: most tests pass, 6 fail, possibly due to the removed operators - wrap julia.wasm in a local Zed extension
- comment out quite a few lines in highlights.scm and the other .scm files that cause errors in Zed's log
In Zed, the syntax tree for some of my .jl files looks pretty clean considering all the hacks above. Basic syntax highlighting works.
I'd call it progress but it's certainly not safe for public consumption. I'll wait for #136 getting merged and try again.
I'd assumed scanner.c was auto-generated.
It is not; it needs to be hand-written. And, yes, you should absolutely wait for the mentioned PRs to be merged.