megaparsec icon indicating copy to clipboard operation
megaparsec copied to clipboard

Greedy combinators

Open bristermitten opened this issue 1 year ago • 11 comments

This may be an XY problem, but I'm having an issue with sepEndBy not being greedy enough, and is hiding errors.

For context, I'm parsing a haskell-like language with the following syntax

module Main
import Prelude
let main = println "Hello world!"

The code to do this is pretty simple:

module' :: Parser (Module Frontend)
module' = dbg "module'" $ do
    header <- optional . try $ header
    let _name = maybe (ModuleName ("Main" :| [])) fst header
    skipNewlines
    imports <- import' `sepEndBy` many newline
    declarations <- declaration _name `sepEndBy` many newline

    pure $
        Module
            { _moduleName = _name
            , _moduleExposing = maybe ExposingAll snd header
            , _moduleImports = imports
            , _moduleDeclarations = declarations
            }

header :: Parser (ModuleName, Exposing MaybeQualified)
header = do
    -- module Name exposing (..)
    symbol "module"
    moduleName' <- lexeme Parse.moduleName
    exposing' <- exposing
    pure (moduleName', exposing')

Using module' <* eof as the "entrypoint" parser.

The problem is, if declaration fails for some reason, the error thrown doesn't propagate up. Instead, sepEndBy just returns an empty list, and the parser succeeds. This results in some very opaque error messages. For example, if I omitted the body for the let declaration: let main = I'd expect to see an error saying something akin to Unexpected end of input, expecting expression. Instead, the module' succeeds with an empty list of declarations, and the eof causes an unintuitive and vague error message:

unexpected 'l'
expecting "import", end of input, or newline

(I'm not actually sure why it says this and not something like Unexpected let ..., expecting eof, but that's not the main problem here)

I have dbg'd and declaration throws the expected error, the problem is with sepEndBy.

My question is, is there an alternative to sepEndBy that will "greedily" parse the declarations? I still want to allow empty modules so sepEndBy1 won't do, but if a let is seen it should keep parsing and fail if necessary, rather than backtracking and returning an empty list.

Thanks for any help!

bristermitten avatar Feb 17 '23 17:02 bristermitten