pest Better error reporting.

I played with the online editor and wasn't very happy with the error messages.

It seems to report only the name of the root rule: expected bar, while I would expect to see a rather more descriptive message:

foo = { "["~digit+~"]" }
bar = { "list"~"="~foo }

last~(a) <== bar expected "list"
list~(a) <== bar expected "="
list=(a) <== foo expected "["
list=[a] <== digit expected '0'..'9'

Jul 18 '20 20:07 Kesanov

I'm experiencing the same issue with error messages. The message always seems to report the top level rule, even if part of the rule is matched correctly and the actual error is at some other place. For example, I was using the following grammar for testing:

WHITESPACE = _{ " " | "\t" | "\n" }

identifier = { (ASCII_ALPHA | "_") ~ (ASCII_ALPHANUMERIC | "_")* }

condition = { "true" | "false" }

rule = {  "rule" ~ identifier ~ "{" ~ "condition:" ~ condition ~ "}" }

And then tried the following input:

rule test { foo: true }

The problem with that input is that the grammar is expecting condition instead of foo, so I would expect an error like:

 --> 1:13
  |
1 | rule test { foo: true }
  |             ^---
  |
  = expected condition

But instead I got:

 --> 1:1
  |
1 | rule test { foo: true }
  | ^---
  |
  = expected rule

This error is very misleading, the first few tokens of the rule are actually correct, and the failure comes when the parser finds foo instead of condition, and that's what the error should say.

Follow-up: Interestingly enough, I get the expected error if I change my grammar to:

WHITESPACE = _{ " " | "\t" | "\n" }

identifier = {
  (ASCII_ALPHA | "_") ~ (ASCII_ALPHANUMERIC | "_")* 
}

condition = { "condition" }
expression = { "true" | "false" }

rule = { 
	"rule" ~ identifier ~ "{" ~ 
       condition ~ ":" ~
         expression ~
    "}" 
}

So, it looks like the errors are always expecting <something>, where <something> is the name of the production rule that failed to match, but it doesn't report errors for unexpected tokens within a production rule.

May 20 '22 15:05 plusvic

Hi! Developing sbroad (library for distributed SQL queries execution) we've faced the same problem. We use pest for parsing SQL, but sometimes we get error messages that are hard to link with the real syntax problem.

Some examples:

I leave the link to our simplified SQL grammar, but it seems to be redundant for examples understanding.

Query: create table t_name(col_name int null). Note: in our grammar we suppose to see the primary key next to the columns definition that must be separeted with comma. Error message:
```
 --> 1:27
  |
1 | create table bla(name int null)
  |                           ^---
  |
  = expected NotFlag
```
Explanation: after succesfully parsing "int" pest enters optional ColumnDefIsNull rule, tracks optional NotFlag as farthest parsed with error rule and stores it in pos_attempts (from which we later build the error message).
Query: select 1 from a group b. Note: we expect to see error message about incomplete "by" token. Error message:
```
 --> 1:17
  |
1 | select 1 from a group b
  |                 ^---
  |
  = expected EOI, Join, GroupByBlock, or HavingBlock
```
Explanation: after successful parsing of "group" token, pest doesn't track it anywhere and counts GroupByBlock (or HavingBlock) rule starting at "group" keyword to be the farthest rule that we've tried to parse but errored.

It seems to me that the main problem with errors is that we don't track tokens that our grammar expects to see at an farthest errored position and that library user can't interact with the sequence of rules that were applied to parse the input.

My proposal is to tinker logic in the parser_state.rs to track the following info inside ParserState:

Vector of expected tokens
Stacks of rule calls

Later we can can pass these info into ParsingError and use on the user side in order to form custom error messages based on expected rules.

Please see proposed changes at MR.

Jan 09 '24 10:01 EmirVildanov