tree-sitter-haskell Comments following function included in function pattern

For functions with a do block, the comments following the function get included in the function, for example:

f = do a

-- | haddock
g = b

here the function pattern will include all of f and the doc comment of g. This isn't the case when there is no do block:

f = a

-- | haddock
g = b

in this case it works as I expected, only matching f = a.

I tested this out using the latest commit on the master branch, using the following tree sitter query:

(function rhs: (_) @function.inside) @function.around

(both captures end up including the doc comment)

Jun 14 '22 17:06 rynoV

Same thing happens for class and instance patterns, for example:

instance Class Data where
  f = a

-- | haddock
g = a

class Class where
  f :: Data

-- | haddock
g = a

(class) @class.around
(instance (where)? . _ @class.inside) @class.around

Jun 15 '22 00:06 rynoV

I'm not sure that it's feasible to implement this, since comments are allowed to break indentation:

f = do
  g

-- foo
  pure 1

so in order to decide whether the comment should terminate the do layout, we'd need to parse the indent of the following line, which would require us to either

jump back to after g if indentation decreases to terminate the do node
jump back to after foo if indentation stays the same so that the leading spaces of the next line won't be included in the comment (and we need them to determine the indent again for the next node)

and this won't work since we can't store two positions at once :frowning_face:

(in case that is unclear: comments and indent are parsed manually in the C extension)

The only way I can imagine now would be to compromise and use -- | as an indicator, but since that isn't Haskell syntax, but Haddock, it could break valid code. Though it's probably unlikely to occur in an invalid position.

I'll think a bit more about this but I'm fairly pessimistic.

Jun 28 '22 22:06 tek

@414owen do you have an idea maybe?

Jun 28 '22 22:06 tek

I guess I'm unsure why it works without the do block. I would have thought the lexer would only detect the end of f when it sees function g, which would be after the comment.

Jun 29 '22 07:06 414owen

indeed, that's curious

Jun 29 '22 11:06 tek

ok so in the case without do the function is entirely contained in the range a = b, so tree-sitter is conservative and uses the smallest tree that works, leaving the comment on its own since there's no reason to associate it with any neighboring node more than the others.

for the do case, the layout end is part of the function rhs, so the comment cannot escape that tree.

Jun 29 '22 20:06 tek

tree-sitter-haskell tree-sitter-haskell copied to clipboard

Comments following function included in function pattern

tree-sitter-haskell
tree-sitter-haskell copied to clipboard