tree-sitter-julia
tree-sitter-julia copied to clipboard
bug: REPL prompt parsed incorrectly
Did you check existing issues?
- [X] I have read all the tree-sitter docs if it relates to using the parser
- [X] I have searched the existing issues of tree-sitter-PARSER_NAME
Tree-Sitter CLI Version, if relevant (output of tree-sitter --version)
No response
Describe the bug
REPL prompt's (e.g. in julia-repl codeblocks in docstrings, etc) are not recognized as a thing™ and are parsed using normal rules.
Steps To Reproduce/Bad Parse Tree
julia> a = [1,2]
gives
(source_file [0, 0] - [1, 0]
(assignment [0, 0] - [0, 16]
(binary_expression [0, 0] - [0, 8]
(identifier [0, 0] - [0, 5])
(operator [0, 5] - [0, 6])
(identifier [0, 7] - [0, 8]))
(operator [0, 9] - [0, 10])
(vector_expression [0, 11] - [0, 16]
(integer_literal [0, 12] - [0, 13])
(integer_literal [0, 14] - [0, 15]))))
and even more wrong (as far as the parse tree goes) is with a preceding empty REPL prompt
julia>
julia> a = [1,2]
(source_file [0, 0] - [2, 0]
(assignment [0, 0] - [1, 16]
(binary_expression [0, 0] - [1, 8]
(binary_expression [0, 0] - [1, 5]
(identifier [0, 0] - [0, 5])
(operator [0, 5] - [0, 6])
(identifier [1, 0] - [1, 5]))
(operator [1, 5] - [1, 6])
(identifier [1, 7] - [1, 8]))
(operator [1, 9] - [1, 10])
(vector_expression [1, 11] - [1, 16]
(integer_literal [1, 12] - [1, 13])
(integer_literal [1, 14] - [1, 15]))))
(Equivalent julia code to that parse tree is julia > julia > a = [1,2] which is invalid/incoherent syntax which throws an error)
Expected Behavior/Parse Tree
I would expect something like
(source_file
(repl_prompt
(assignment
(operator
(vector_expression
(integer_literal)
(integer_literal))))))
and
(source_file
(repl_prompt)
(repl_prompt
(assignment
(operator
(vector_expression
(integer_literal)
(integer_literal))))))
Repro
No response
The repl prompt isn't parsed because it's not part of the language at all.
e.g. in julia-repl codeblocks in docstrings
What editor/platform are you using that uses tree-sitter for julia-repl code blocks?
To be clear, the problem here is that there's no way of knowing what's code, i.e. input, and what's not code, i.e. the prompt and the output.
A possible solution would be to have a separate grammar that parses the prompts and treats everything between the prompt and a newline as a Julia code injection. Lines without prompt are assumed to be output. The limitation in this case would be that it could not parse multi-line inputs.
What editor/platform are you using that uses tree-sitter for julia-repl code blocks?
Noevim; I added injection queries to markdown to highlight julia-repl, jldoctest, and Documenter blocks (e.g. @example, etc).
The repl prompt isn't parsed because it's not part of the language at all.
Not sure I agree with that; depends on your perspective of technically vs functionally. The REPL properly parses (strips) copy-pastes of REPL code/prompts. So ideally, Julia things would be parsed/highlighted correctly by a (but maybe not this) julia parser.
I played around a bit yesterday trying some simple rules (e.g. require that repl prompts occur at the beginning of a line using token.immediate, etc), but I haven't worked with tree-sitter grammars before so I didn't make much progress.
I hadn't considered that repl output would need to be explicitly minimally/not parsed, therefore definitely requiring a separate julia-repl grammar. How difficult would it be to adapt the existing rules here for a new grammar that could handle multi-line inputs? (Given that this julia grammar already correctly handles multi-line statements/blocks, etc.)
How difficult would it be to adapt the existing rules here for a new grammar that could handle multi-line inputs?
No idea.
I know some repos have multiple grammars to handle multi-language documents, like tree-sitter/tree-sitter-typescript. That might be a good first place to look.