dhall-haskell icon indicating copy to clipboard operation
dhall-haskell copied to clipboard

Poor parse error message with parenthesis-enclosed function application

Open SiriusStarr opened this issue 5 years ago • 15 comments

If a parse error occurs within parentheses in function application, instead of an error at the point of the parse error, you instead get unexpected '(' at the open parenthesis. For example:

> dhall <<< '(3f)'
Error: Invalid input

(stdin):1:3:
1 | (3f)
  |   ^
unexpected 'f'

vs

> dhall <<< 'a (3f)'
Error: Invalid input

(stdin):1:3:
1 | a (3f)
  |   ^
unexpected '('

This does not itself seem particularly problematic, until the expression within the parentheses is complex and the parse error difficult to see at a glance. For example:

1 | Optional/fold Text optional Text (λ(text : Text) → "\latexcommand{${text}}") ""
  |                                  ^
unexpected '('

You can stare at the above for quite a long time before you notice the real problem, namely that \l is not a valid escape sequence, and it should be "\\latexcommand{${text}}". (Don't ask me how I know this...)

SiriusStarr avatar Dec 06 '19 19:12 SiriusStarr

As far as I can tell, the only way to fix this is to make two changes:

  • Remove support for customizable import parsing (e.g. the Dhall.Parser.exprA utility)

    ... which is a prelude to:

  • Switch to an LALR parser (e.g. Earley) or parser generator (e.g. happy)

    ... which would fix most of the parsing issues that we've been running into

I don't think there are any shortcuts left at this point for fixing this particular issue

Gabriella439 avatar Dec 08 '19 16:12 Gabriella439

This is a particularly aggravating issue when you have a long file, example I had this typo:

screenshot

on line 1238, and it caused this error message:

> dhall --file ./log2.dhall
dhall: 
Error: Invalid input

./log2.dhall:1217:11:
     |
1217 |           [ tweag
     |           ^
unexpected '['
expecting ',', ->, :, ], operator, or whitespace

That’s two screen heights above the actual error at the beginning of the nested list!

Profpatsch avatar Apr 01 '20 01:04 Profpatsch

For the record: https://github.com/dhall-lang/dhall-haskell/pull/1740 was the first step to implement the plan in https://github.com/dhall-lang/dhall-haskell/issues/1592#issuecomment-562965613.

sjakobi avatar May 05 '20 11:05 sjakobi

I recently had a run-in with this issue. I think I would have torn my hair out, if it was long enough to do so! ;)

@Gabriel439 I see that you've mentioned generating the parser with happy above. I've done small edits on GHC's happy code, but I don't have a good understanding of it. Would we generate the parser from dhall.abnf, or a modified version of it?

sjakobi avatar May 26 '20 09:05 sjakobi

Would we generate the parser from dhall.abnf, or a modified version of it?

Happy doesn't directly work with that specific format so we need to port it to happy syntax. The only problem that we might encounter by porting dhall.abnf to happy syntax is that happy works much better on a right-recursive grammar and dhall.abnf spec is mostly left-recursive (indirectly).

german1608 avatar May 26 '20 12:05 german1608

@sjakobi: Here's an example of what an alex lexer and happy parser might look like, from an older project of mine:

  • https://github.com/Gabriel439/Haskell-Morte-Library/blob/ed317e1169c1ab3691648768eb7553dbda6fbd1d/src/Morte/Lexer.x
  • https://github.com/Gabriel439/Haskell-Morte-Library/blob/ed317e1169c1ab3691648768eb7553dbda6fbd1d/src/Morte/Parser.y

However, I later switched to Earley for the reasons outlined in this commit description:

https://github.com/Gabriel439/Haskell-Morte-Library/commit/f6820953765e5d518469fbd47834215394626068

Gabriella439 avatar May 26 '20 15:05 Gabriella439

@Gabriel439 Thanks! With Earley we'd still have a hand-written parser, right? It can't be generated from dhall.abnf as some other implementations seem to do?

sjakobi avatar May 26 '20 15:05 sjakobi

@sjakobi: Yeah, it would still be hand-written, but it would no longer require backtracking (which is the main reason for poor error messages) or tricks to avoid pathological performance. The main downside of Earley is that it might have slower constant factors because it doesn't support fast bulk operations like megaparsec does

Gabriella439 avatar May 26 '20 16:05 Gabriella439

Morte … I just now realized that the Planescape naming scheme is older than Dhall :)

Profpatsch avatar May 28 '20 15:05 Profpatsch

@Profpatsch: Yep! That's correct :slightly_smiling_face:

Gabriella439 avatar May 28 '20 15:05 Gabriella439

@ocharles just shared his Earley-based parser in https://hub.darcs.net/ocharles/dhalli/browse/lib/Dhalli/Syntax.hs with me. Might be a good resource when we finally make the switch! :)

sjakobi avatar Mar 16 '21 00:03 sjakobi

BTW, Happy let's you export multiple parsers these days, and it can report expected tokens (output may need some tidying, but it is doable).

TheMatten avatar Mar 16 '21 16:03 TheMatten

By chance I've come across headed-megaparsec, which seems to have the purpose of providing better error messages (see https://hackage.haskell.org/package/headed-megaparsec-0.2.0.2/docs/HeadedMegaparsec.html#t:HeadedParsec).

I'm wondering whether it might be useful for resolving this issue without abandoning the current parser code entirely. Currently I don't properly understand what it does and how it works though.

sjakobi avatar Dec 15 '21 13:12 sjakobi

It seems the parser is backtracking too much (I’m bitten by this a lot when writing any “serious” dhall expressions).

Can the earley parser handle that correctly? If so, what’s hindering us from using @ocharles version?

Profpatsch avatar Jun 18 '22 11:06 Profpatsch

It's probably possible to switch to an Earley parser. It just requires a lot of work. We can't use @ocharles's version exactly as written because there are too many subtle differences, but we could consult it for writing the equivalent Dhall parser.

Gabriella439 avatar Jul 02 '22 18:07 Gabriella439