Nested heredocs are not parsed correctly
puts <<HERE
hello #{<<HERE}
world
HERE
HERE
In the following parse tree the ranges of the heredoc bodies are not right
program [0, 0] - [6, 0])
method_call [0, 0] - [0, 11])
method: identifier [0, 0] - [0, 4])
arguments: argument_list [0, 5] - [0, 11])
heredoc_beginning [0, 5] - [0, 11])
heredoc_body [0, 11] - [3, 4])
interpolation [1, 8] - [1, 17])
heredoc_beginning [1, 10] - [1, 16])
heredoc_end [3, 0] - [3, 4])
heredoc_body [3, 4] - [4, 4])
heredoc_end [4, 0] - [4, 4])
I've run into this issue when trying to run semgrep on a ruby file with nested heredocs. (https://github.com/returntocorp/semgrep/issues/3151)
When I paste the following into https://tree-sitter.github.io/tree-sitter/playground:
output =
<<~ABC
Top
#{
<<~DEF
Middle
DEF
}
Bottom
ABC
puts output
I get the following output:
program [0, 0] - [13, 0])
assignment [0, 0] - [1, 8])
left: identifier [0, 0] - [0, 6])
right: heredoc_beginning [1, 2] - [1, 8])
heredoc_body [1, 8] - [9, 5])
heredoc_content [1, 8] - [3, 4])
interpolation [3, 4] - [7, 5])
heredoc_beginning [4, 6] - [4, 12])
constant [5, 8] - [5, 14])
constant [6, 6] - [6, 9]). <-- This is the closing DEF of the HEREDOC string
heredoc_content [7, 5] - [9, 2])
heredoc_end [9, 2] - [9, 5])
heredoc_body [9, 5] - [13, 0])
heredoc_content [9, 5] - [13, 0])
heredoc_end [13, 0] - [13, 0])
I have the same issue with bourne shell. Nested heredocs are totally wrongly parsed and thus wrongly highlighted e.g. by neovim.
Do you have any news on the state of heredocs in treesitter?
@dumblob heredocs are not a treesitter feature. Support for special lexcical things such as heredocs are implemented by scanner.cc. I guess the scanner used tree-sitter-bash has a similar bug as the one of tree-sitter-ruby.