tree-sitter-commonlisp
tree-sitter-commonlisp copied to clipboard
bug: Tree-sitter fails when encountering square brackets
Did you check existing issues?
- [ ] I have read all the tree-sitter docs if it relates to using the parser
- [X] I have searched the existing issues
Tree-Sitter CLI Version, if relevant (output of tree-sitter --version)
tree-sitter 0.21.0 (1c55abb5308fe3891da545662e5df7ba28ade275)
Describe the bug
Just started learning commonlisp, so there may be some subtleties here that I'm simply overlooking. However: Tree-sitter fails when encountering the square brackets in this piece of code:
(ql:quickload :cl-collider)
(in-package :sc-user)
(named-readtables:in-readtable :sc)
(setf *s* (make-external-server "localhost" :port 48800))
(server-boot *s*)
(defvar *synth*)
(setf *synth* (play(sin-osc.ar [320 321] 0 0.2)))
(free *synth*)
(server-quit *s*)
(server-query-all-nodes)
Steps To Reproduce/Bad Parse Tree
(source [0, 0] - [14, 0]
(list_lit [0, 0] - [0, 27]
value: (package_lit [0, 1] - [0, 13]
package: (sym_lit [0, 1] - [0, 3])
symbol: (sym_lit [0, 4] - [0, 13]))
value: (kwd_lit [0, 14] - [0, 26]
(kwd_symbol [0, 15] - [0, 26])))
(list_lit [1, 0] - [1, 21]
value: (sym_lit [1, 1] - [1, 11])
value: (kwd_lit [1, 12] - [1, 20]
(kwd_symbol [1, 13] - [1, 20])))
(list_lit [2, 0] - [2, 35]
value: (package_lit [2, 1] - [2, 30]
package: (sym_lit [2, 1] - [2, 17])
symbol: (sym_lit [2, 18] - [2, 30]))
value: (kwd_lit [2, 31] - [2, 34]
(kwd_symbol [2, 32] - [2, 34])))
(list_lit [4, 0] - [4, 57]
value: (sym_lit [4, 1] - [4, 5])
value: (sym_lit [4, 6] - [4, 9])
value: (list_lit [4, 10] - [4, 56]
value: (sym_lit [4, 11] - [4, 31])
value: (str_lit [4, 32] - [4, 43])
value: (kwd_lit [4, 44] - [4, 49]
(kwd_symbol [4, 45] - [4, 49]))
value: (num_lit [4, 50] - [4, 55])))
(list_lit [5, 0] - [5, 17]
value: (sym_lit [5, 1] - [5, 12])
value: (sym_lit [5, 13] - [5, 16]))
(list_lit [7, 0] - [7, 16]
value: (sym_lit [7, 1] - [7, 7])
value: (sym_lit [7, 8] - [7, 15]))
(ERROR [8, 0] - [14, 0]
value: (sym_lit [8, 1] - [8, 5])
value: (sym_lit [8, 6] - [8, 13])
value: (sym_lit [8, 15] - [8, 19])
value: (sym_lit [8, 20] - [8, 30])))
test.lisp 0.10 ms 2822 bytes/ms (ERROR [8, 0] - [14, 0])
Expected Behavior/Parse Tree
I expect treesitter to be able to parse this normally, as the code compiles fine on sbcl/arch linux.
Repro
No response
OK, getting a bit further. Turns out that the square brackets are cl-collider-specific macros, syntactic sugar for (list x y z). Which kind of makes it hard to imagine implementing it as a grammar for tree-sitter. Unless there would be some way to extract a grammar dynamically from whichever package is currently active... Nice dream, but possibly a gargantuan task.
@kflak Sorry, for the long delay!
Common Lisp is a dynamic language were new syntax can be dynamically defined. Do you thing #35 could be a good assumption for this non-standard syntax? That you would basically have [ <form> <form> <form> ] just like you would have s-exprs with parens?
Hi,
No worries about the time taken :-) I'm still just dabbling around on the edges of the language until I have a bit more time to dedicate to it. Which also means I am not really in a position to evaluate the sanity of the approach. But I would love to try it out and see how it works!
In case you're using neovim you can just change to my commit nvim-treesitter's lockfile.json
I'm not sure whether this is the right approach? Are you allowed to have [ in identifiers by default? Essentially, the best we can do for macros is guessing what it could mean.
I'm a bit doubtful about the approach, tbh... As syntax can be defined on the fly (with the consequence that the same tokens could mean totally different things at two different locations of a file if another package is loaded), I'm suspecting that the only way to get it right is by actually evaluating code to see what it means. Which would obviously be far beyond the scope of treesitter...
EDIT 2 days later: that being said, I did try out the new branch, and for my specific usecase it works beautifully :-) Sooooo... I don't really know what the good approach for this would be...
from my limited testing [ and ] are valid symbol characters, so you can do something like this without errors:
(defvar [a] 1)
(print [a])
and CL will happily print 1, personally I dont think any special syntax should be implemented for square brackets and the above code:
(play(sin-osc.ar [320 321] 0 0.2))
would just be this:
(list_lit
value: (sym_lit) ; "play"
value: (list_lit
value: (sym_lit) ; "sin-osc.ar"
value: (sym_lit) ; "[320"
value: (sym_lit) ; "321]
value: (num_lit) ; 0
value: (num_lit) ; 0.2
))
from what i can tell im pretty sure that you could fix this by just removing [ and ] from the symbol regex, however since parser is based on the clojure parser that might not work