tree-sitter-commonlisp icon indicating copy to clipboard operation
tree-sitter-commonlisp copied to clipboard

bug: Tree-sitter fails when encountering square brackets

Open kflak opened this issue 1 year ago • 7 comments

Did you check existing issues?

  • [ ] I have read all the tree-sitter docs if it relates to using the parser
  • [X] I have searched the existing issues

Tree-Sitter CLI Version, if relevant (output of tree-sitter --version)

tree-sitter 0.21.0 (1c55abb5308fe3891da545662e5df7ba28ade275)

Describe the bug

Just started learning commonlisp, so there may be some subtleties here that I'm simply overlooking. However: Tree-sitter fails when encountering the square brackets in this piece of code:

(ql:quickload :cl-collider)
(in-package :sc-user)
(named-readtables:in-readtable :sc)

(setf *s* (make-external-server "localhost" :port 48800))
(server-boot *s*)

(defvar *synth*)
(setf *synth* (play(sin-osc.ar [320 321] 0 0.2))) 
(free *synth*)

(server-quit *s*)
(server-query-all-nodes)

Steps To Reproduce/Bad Parse Tree

(source [0, 0] - [14, 0]
  (list_lit [0, 0] - [0, 27]
    value: (package_lit [0, 1] - [0, 13]
      package: (sym_lit [0, 1] - [0, 3])
      symbol: (sym_lit [0, 4] - [0, 13]))
    value: (kwd_lit [0, 14] - [0, 26]
      (kwd_symbol [0, 15] - [0, 26])))
  (list_lit [1, 0] - [1, 21]
    value: (sym_lit [1, 1] - [1, 11])
    value: (kwd_lit [1, 12] - [1, 20]
      (kwd_symbol [1, 13] - [1, 20])))
  (list_lit [2, 0] - [2, 35]
    value: (package_lit [2, 1] - [2, 30]
      package: (sym_lit [2, 1] - [2, 17])
      symbol: (sym_lit [2, 18] - [2, 30]))
    value: (kwd_lit [2, 31] - [2, 34]
      (kwd_symbol [2, 32] - [2, 34])))
  (list_lit [4, 0] - [4, 57]
    value: (sym_lit [4, 1] - [4, 5])
    value: (sym_lit [4, 6] - [4, 9])
    value: (list_lit [4, 10] - [4, 56]
      value: (sym_lit [4, 11] - [4, 31])
      value: (str_lit [4, 32] - [4, 43])
      value: (kwd_lit [4, 44] - [4, 49]
        (kwd_symbol [4, 45] - [4, 49]))
      value: (num_lit [4, 50] - [4, 55])))
  (list_lit [5, 0] - [5, 17]
    value: (sym_lit [5, 1] - [5, 12])
    value: (sym_lit [5, 13] - [5, 16]))
  (list_lit [7, 0] - [7, 16]
    value: (sym_lit [7, 1] - [7, 7])
    value: (sym_lit [7, 8] - [7, 15]))
  (ERROR [8, 0] - [14, 0]
    value: (sym_lit [8, 1] - [8, 5])
    value: (sym_lit [8, 6] - [8, 13])
    value: (sym_lit [8, 15] - [8, 19])
    value: (sym_lit [8, 20] - [8, 30])))
test.lisp	   0.10 ms	  2822 bytes/ms	(ERROR [8, 0] - [14, 0])

Expected Behavior/Parse Tree

I expect treesitter to be able to parse this normally, as the code compiles fine on sbcl/arch linux.

Repro

No response

kflak avatar Mar 07 '24 05:03 kflak

OK, getting a bit further. Turns out that the square brackets are cl-collider-specific macros, syntactic sugar for (list x y z). Which kind of makes it hard to imagine implementing it as a grammar for tree-sitter. Unless there would be some way to extract a grammar dynamically from whichever package is currently active... Nice dream, but possibly a gargantuan task.

kflak avatar Mar 07 '24 17:03 kflak

@kflak Sorry, for the long delay!

Common Lisp is a dynamic language were new syntax can be dynamically defined. Do you thing #35 could be a good assumption for this non-standard syntax? That you would basically have [ <form> <form> <form> ] just like you would have s-exprs with parens?

theHamsta avatar Apr 06 '24 20:04 theHamsta

Hi,

No worries about the time taken :-) I'm still just dabbling around on the edges of the language until I have a bit more time to dedicate to it. Which also means I am not really in a position to evaluate the sanity of the approach. But I would love to try it out and see how it works!

kflak avatar Apr 07 '24 08:04 kflak

In case you're using neovim you can just change to my commit nvim-treesitter's lockfile.json

I'm not sure whether this is the right approach? Are you allowed to have [ in identifiers by default? Essentially, the best we can do for macros is guessing what it could mean.

theHamsta avatar Apr 07 '24 09:04 theHamsta

I'm a bit doubtful about the approach, tbh... As syntax can be defined on the fly (with the consequence that the same tokens could mean totally different things at two different locations of a file if another package is loaded), I'm suspecting that the only way to get it right is by actually evaluating code to see what it means. Which would obviously be far beyond the scope of treesitter...

EDIT 2 days later: that being said, I did try out the new branch, and for my specific usecase it works beautifully :-) Sooooo... I don't really know what the good approach for this would be...

kflak avatar Apr 07 '24 10:04 kflak

from my limited testing [ and ] are valid symbol characters, so you can do something like this without errors:

(defvar [a] 1)
(print [a])

and CL will happily print 1, personally I dont think any special syntax should be implemented for square brackets and the above code:

(play(sin-osc.ar [320 321] 0 0.2))

would just be this:

(list_lit
  value: (sym_lit) ; "play"
  value: (list_lit
    value: (sym_lit) ; "sin-osc.ar"
    value: (sym_lit) ; "[320"
    value: (sym_lit) ; "321]
    value: (num_lit) ; 0
    value: (num_lit) ; 0.2
    ))

johannesrld avatar May 31 '24 17:05 johannesrld

from what i can tell im pretty sure that you could fix this by just removing [ and ] from the symbol regex, however since parser is based on the clojure parser that might not work

johannesrld avatar Jun 13 '24 15:06 johannesrld