megaparsec icon indicating copy to clipboard operation
megaparsec copied to clipboard

Method for adjusting subsequent parsers' errors

Open Lev135 opened this issue 1 year ago • 2 comments

I propose to add a special method in MonadParsec:

-- | Adjust the next `ParseError`, if it occur __before any tokens will be__
-- __consumed__. This can be used to add custom hint to the error or even
-- replace it.
adjustNextError :: (ParseError s e -> ParseError s e) -> m ()

This can be used to improve error messages in some cases. For example, we can define a version of lineFold, which allows us to use the same space consumer in all cases, including after the last symbol:

lineFold ::
  (TraversableStream s, MonadParsec e s m) =>
  -- | Line space consumer (should *not* consume end of lines)
  m () ->
  -- | Line space and eols consumer
  m () ->
  -- | Callback that uses provided space-consumer
  (m () -> m a) ->
  m a
lineFold sc scn action = do
  lvl <- sc *> indentLevel
  action $ sc *> do
    st <- getParserState
    lvl' <- scn *> indentLevel
    unless (lvl' > lvl) $ do
      o <- getOffset
      setParserState st
      let e = FancyError o $ E.singleton $ ErrorIndentation GT lvl lvl'
      changeNextError (const e)

and than an example from linefold documentation becomes more elegant:

sc = L.space (void spaceChar) empty empty

myFold = L.lineFold sc $ \sc' -> do
  L.symbol sc' "foo"
  L.symbol sc' "bar"
  L.symbol sc' "baz" -- we do not need special consumer here

I think more usecases should be found for those, who use custom errors (for example, we can add a hint with custom information here).

Going to implementation for ParsecT, I have the following idea: add a ParseError e -> ParseError e endo into Hints data type. It doesn't seems to me to be too expensive, but this shoould be benchmarked, of course.

I've implemented this in my fork of the repo and here is the comparision with the main branch. Of course it's not the ultimate version. Many questions are open:

  • [ ] all changes should be benchmarked and maybe something changed to preserve efficiency
  • [ ] how this should interact with delayed errors?
  • [ ] should we track an error endo pos to remove it when backtracing (as well as other hints)?
  • [ ] should we compose two endos or just take the last/first one, if compose than in which order?

I'm opening this issue here to get your opinion about this feature, is it too expensive for implementation, what's are another drawbacks, that I possibly don't see, etc.

Lev135 avatar Oct 17 '22 09:10 Lev135