chevrotain icon indicating copy to clipboard operation
chevrotain copied to clipboard

Recovery at end of input not working?

Open tlrobinson opened this issue 4 years ago • 2 comments

I'm trying to get recovery to insert a token at the end of the input but it doesn't appear to be working. Here's a really simple grammar to parse function calls like foo(). I want it to also parse foo( using token insertion. Am I doing something wrong?

(function expressionExample() {
  // ----------------- Lexer -----------------
  const createToken = chevrotain.createToken;
  const Lexer = chevrotain.Lexer;

  const Identifier = createToken({name: "Identifier", pattern: /[a-zA-Z]+/});
  const LCurly = createToken({name: "LCurly", pattern: /\(/});
  const RCurly = createToken({name: "RCurly", pattern: /\)/});

  const expressionTokens = [Identifier, LCurly, RCurly];

  const ExpressionLexer = new Lexer(expressionTokens, {
    positionTracking: "onlyStart"
  });

  // Labels only affect error messages and Diagrams.
  LCurly.LABEL = "'{'";
  RCurly.LABEL = "'}'";

  // ----------------- parser -----------------
  const Parser = chevrotain.Parser;

  class ExpressionParser extends Parser {
    constructor() {
      super(expressionTokens, {
        recoveryEnabled: true
      })

      const $ = this;

      $.RULE("expression", () => {
        $.CONSUME(Identifier);
        $.CONSUME(LCurly);
        $.CONSUME(RCurly);
      });

      // very important to call this after all the rules have been setup.
      // otherwise the parser may not work correctly as it will lack information
      // derived from the self analysis.
      this.performSelfAnalysis();
    }
  }

  // for the playground to work the returned object must contain these fields
  return {
    lexer: ExpressionLexer,
    parser: ExpressionParser,
    defaultRule: "expression"
  };
}())

tlrobinson avatar Feb 14 '20 21:02 tlrobinson

Hi @tlrobinson

The simple examples certainly helps. 👍

I don't think the recovery logic handles the edge case of EOI.

  canRecoverWithSingleTokenInsertion(
    this: MixedInParser,
    expectedTokType: TokenType,
    follows: TokenType[]
  ): boolean {
    if (!this.canTokenTypeBeInsertedInRecovery(expectedTokType)) {
      return false
    }

    // must know the possible following tokens to perform single token insertion
    if (isEmpty(follows)) {
      return false
    }

    let mismatchedTok = this.LA(1)
    let isMisMatchedTokInFollows =
      find(follows, (possibleFollowsTokType: TokenType) => {
        return this.tokenMatcher(mismatchedTok, possibleFollowsTokType)
      }) !== undefined

    return isMisMatchedTokInFollows
  }

So to perform single token Insertion the encountered token must match a possible NEXT token. This condition is met in your scenario:

  • foo ( EOF
  • foo ( ) EOF

However I do not believe EOF is counted as part of the possible next tokens. As it is an implicit EOF.

I've tried to explicitly add a CONSUME(chevrotain.EOF) at the end of the rule but without luck. I guess I need to debug this in more depth, I'll update when I find out more.

bd82 avatar Feb 14 '20 23:02 bd82

All-right, I've debugged this again but this time using a full dev env instead of the playground.

Adding an EOF token explicitly seems to resolve the problem.

    $.RULE("expression", () => {
      $.CONSUME(Identifier)
      $.CONSUME(LCurly)
      $.CONSUME(RCurly)
      $.CONSUME(chevrotain.EOF)
    })
  • Note the EOF should be consumed at the top level rule (entry point) of your grammar.

It is possible to make a patch infer the existence of EOF as a "possible next token" in such a case, however because EOF is implicit it is a tiny bit complicated and may not be warranted or high priority when a simple workaround is available...

bd82 avatar Feb 22 '20 11:02 bd82