gocc icon indicating copy to clipboard operation
gocc copied to clipboard

Returning an error in an sdt action produces the "wrong" ErrorToken

Open kfsone opened this issue 4 years ago • 3 comments
trafficstars

note: not using any of my branches for this

Scenario: An sdt action that yields an error produces an error token corresponding to the end of the match, this is frequently undesireable:

newline : '\n';
ident : 'a'-'z' { 'a'-'z' };

<<import "fmt">>

Rule: P1 newline P2 newline P3;

P1 : ident << func()interface{} {fmt.Printf("P1 %#+v\n", $0); return $0}(), nil >>;
P2 : ident << $0, func() error { fmt.Printf("P2 %#+v\n", $0); return nil }() >>;
P3 : ident << nil, fmt.Errorf("should be line 3 col 1") >>;

And then a simple parser wrapper:

func main() {
        l := lexer.NewLexer([]byte("a\nb\nc"))
        p := parser.NewParser()
        _, err := p.Parse(l)
        fmt.Printf("%+v\n", err)
}

the output you get is:

> go run .
P1 &token.Token{Type:3, Lit:[]uint8{0x61}, Pos:token.Pos{Offset:0, Line:1, Column:1}}
P2 &token.Token{Type:3, Lit:[]uint8{0x62}, Pos:token.Pos{Offset:2, Line:2, Column:1}}
Error in S7: $(1,), Pos(offset=5, line=3, column=2): should be line 3 col 1

Not sure whether by design or bug; I can sort of see how choosing a token when there are 0 or many might also be "wrong".

Perhaps a solution would be to allow the user to return a token with the error and have that be the ErrorToken?

P3: "*" identifier "*" << $1, errors.New("can't use identifier between asterisks, that's just rude") >>;

and then the parser would use the first return value if it passes a *token.Token type switch?

  switch t := attr.(type) {
  case *token.Token:
    e.ErrorToken = t
  default:
    /* untouched */
  }

kfsone avatar Mar 20 '21 19:03 kfsone

I think this is a long standing issue and there are some theoretical reasons for it, that I can't remember the details of.

This is another one I will have to pass to @goccmack

awalterschulze avatar Mar 21 '21 08:03 awalterschulze

Sorry this took so long -- I have been very busy.

Gocc's current error reporting is unfortunately designed for syntactic errors with semantic errors as an add on. The error recovery mechanism is also designed recover only from syntactic errors and the parser stops at the first semantic error. A semantic error is caught in the Parse function at the following point:

switch act := action.(type) {
case Accept:
    res = this.stack.PopN(1)[0]
    acc = true
case Shift:
    this.stack.Push(State(act), this.nextToken)
    this.nextToken, this.pos = scanner.Scan()
case Reduce:
    prod := this.prodTab[int(act)]
    attrib, err := prod.ReduceFunc(this.stack.PopN(prod.NumSymbols))

>>>>>>>>>>>>>
    if err != nil {
        return nil, this.newError(err)
<<<<<<<<<<<<<
    } else {
        this.stack.Push(this.gotoTab[this.stack.Top()][prod.Head], attrib)
    }
default:
    panic("unknown action: " + action.String())
}

A simple solution could be another boolean parameter, semanticError, in newError(), causing newError to produce a sensible semantic error without suggested tokens.

What do you think?

goccmack avatar Mar 31 '21 07:03 goccmack

That sounds good!

kfsone avatar Mar 31 '21 17:03 kfsone