tree-sitter-haskell
tree-sitter-haskell copied to clipboard
Comments following function included in function pattern
For functions with a do
block, the comments following the function get included in the function, for example:
f = do a
-- | haddock
g = b
here the function pattern will include all of f
and the doc comment of g
. This isn't the case when there is no do
block:
f = a
-- | haddock
g = b
in this case it works as I expected, only matching f = a
.
I tested this out using the latest commit on the master branch, using the following tree sitter query:
(function rhs: (_) @function.inside) @function.around
(both captures end up including the doc comment)
Same thing happens for class
and instance
patterns, for example:
instance Class Data where
f = a
-- | haddock
g = a
class Class where
f :: Data
-- | haddock
g = a
(class) @class.around
(instance (where)? . _ @class.inside) @class.around
I'm not sure that it's feasible to implement this, since comments are allowed to break indentation:
f = do
g
-- foo
pure 1
so in order to decide whether the comment should terminate the do
layout, we'd need to parse the indent of the following line, which would require us to either
- jump back to after
g
if indentation decreases to terminate thedo
node - jump back to after
foo
if indentation stays the same so that the leading spaces of the next line won't be included in the comment (and we need them to determine the indent again for the next node)
and this won't work since we can't store two positions at once :frowning_face:
(in case that is unclear: comments and indent are parsed manually in the C extension)
The only way I can imagine now would be to compromise and use -- |
as an indicator, but since that isn't Haskell syntax, but Haddock, it could break valid code. Though it's probably unlikely to occur in an invalid position.
I'll think a bit more about this but I'm fairly pessimistic.
@414owen do you have an idea maybe?
I guess I'm unsure why it works without the do block. I would have thought the lexer would only detect the end of f
when it sees function g
, which would be after the comment.
indeed, that's curious
ok so in the case without do
the function is entirely contained in the range a = b
, so tree-sitter is conservative and uses the smallest tree that works, leaving the comment on its own since there's no reason to associate it with any neighboring node more than the others.
for the do
case, the layout end is part of the function rhs, so the comment cannot escape that tree.