megaparsec Indentation error lost in alternative

Indentation error lost in alternative

Open Lev135 opened this issue 1 year ago • 3 comments

Suppose we have the following parsers:

pInd, pB :: Parsec Void String String
pInd = indentGuard (hidden space) EQ (pos1 <> pos1) *> string "a"
pB   = string "b"                                   *> string "a"

If we try to parse the string "a" with them we'll get the following errors (which seems to be good enough for me):

ghci> parseTest (pInd <* hidden eof) "a"
1:1:
  |
1 | a
  | ^
incorrect indentation (got 1, should be equal to 2)
ghci> parseTest (pB <* hidden eof) "a"  
1:1:
  |
1 | a
  | ^
unexpected 'a'
expecting 'b'

However, if we'll try them as optional, the second one preserve a useful error message:

ghci> parseTest (optional (try pB) <* hidden eof) "a"  
1:1:
  |
1 | a
  | ^
unexpected 'a'
expecting 'b'

while the first loses it at all:

ghci> parseTest (optional (try pInd) <* hidden eof) "a"  
1:1:
  |
1 | a
  | ^
unexpected 'a'

Why megaparsec behaves so? Are there places where preserving indentation error is undesirable or maybe I'm using wrong combinators here?

May 29 '23 20:05 Lev135

optional (try pInd) in the second example succeeds because it is the same as:

(Just <$> pInd) <|> pure Nothing

Since hints are only for non-fancy errors (it is a collection of ErrorItems) there is no way the error about indentation could be persisted in this case.

May 30 '23 15:05 mrkkrp

Since hints are only for non-fancy errors (it is a collection of ErrorItems) there is no way the error about indentation could be persisted in this case.

However, why do we need such behavior? Maybe for really user-defined errors there is no visible way to translate them into hints, but for indentation errors in particular it doesn't seems to be complicated to add special hints. These are just some extra Int values in the parser state, so I hope it shouldn't affect performance in any visible way. Maybe the example with optional doesn't worth to add them, but in case of multiline indented blocks (which is much more common case) the same problem exists.

Maybe it's not so trivial to compose indentation hints. For example, if we have nested indented blocks:

what's the correct indentation of X? Of course, we can't say this confidently, but we could suppose, that it should belong to Cs column. However, with the current behavior (assuming that all Bs and Cs are arbitrary many and we are using some/many to parse them) we get "Unexpected X, expecting end of input" or (if we're checking the next element two be nonIndented) "Incorrect indentation, got 10, should be 1" while I think that much more sensible would be "Incorrect indentation, got 10, should be 9 (or 5, or 1)" or something like this.

May 30 '23 16:05 Lev135

There is no strong conceptual reason why this is not supported. I think it is just not many people use the indentation features much and there was not enough demand for the library to evolve in that direction. There is also the question of balance between features vs complexity of the implementation.

Jun 02 '23 14:06 mrkkrp

megaparsec megaparsec copied to clipboard

Indentation error lost in alternative

megaparsec
megaparsec copied to clipboard