megaparsec
megaparsec copied to clipboard
Indentation error lost in alternative
Suppose we have the following parsers:
pInd, pB :: Parsec Void String String
pInd = indentGuard (hidden space) EQ (pos1 <> pos1) *> string "a"
pB = string "b" *> string "a"
If we try to parse the string "a"
with them we'll get the following errors (which seems to be good enough for me):
ghci> parseTest (pInd <* hidden eof) "a"
1:1:
|
1 | a
| ^
incorrect indentation (got 1, should be equal to 2)
ghci> parseTest (pB <* hidden eof) "a"
1:1:
|
1 | a
| ^
unexpected 'a'
expecting 'b'
However, if we'll try them as optional
, the second one preserve a useful error message:
ghci> parseTest (optional (try pB) <* hidden eof) "a"
1:1:
|
1 | a
| ^
unexpected 'a'
expecting 'b'
while the first loses it at all:
ghci> parseTest (optional (try pInd) <* hidden eof) "a"
1:1:
|
1 | a
| ^
unexpected 'a'
Why megaparsec behaves so? Are there places where preserving indentation error is undesirable or maybe I'm using wrong combinators here?
optional (try pInd)
in the second example succeeds because it is the same as:
(Just <$> pInd) <|> pure Nothing
Since hints are only for non-fancy errors (it is a collection of ErrorItem
s) there is no way the error about indentation could be persisted in this case.
Since hints are only for non-fancy errors (it is a collection of ErrorItems) there is no way the error about indentation could be persisted in this case.
However, why do we need such behavior? Maybe for really user-defined errors there is no visible way to translate them into hints, but for indentation errors in particular it doesn't seems to be complicated to add special hints. These are just some extra Int
values in the parser state, so I hope it shouldn't affect performance in any visible way. Maybe the example with optional
doesn't worth to add them, but in case of multiline indented blocks (which is much more common case) the same problem exists.
Maybe it's not so trivial to compose indentation hints. For example, if we have nested indented blocks:
1A
1B
1C
2C
X
what's the correct indentation of X
? Of course, we can't say this confidently, but we could suppose, that it should belong to C
s column. However, with the current behavior (assuming that all B
s and C
s are arbitrary many and we are using some
/many
to parse them) we get "Unexpected X, expecting end of input" or (if we're checking the next element two be nonIndented
) "Incorrect indentation, got 10, should be 1" while I think that much more sensible would be "Incorrect indentation, got 10, should be 9 (or 5, or 1)" or something like this.
There is no strong conceptual reason why this is not supported. I think it is just not many people use the indentation features much and there was not enough demand for the library to evolve in that direction. There is also the question of balance between features vs complexity of the implementation.