lark Lalr parser raises UnexpectedToken('$END', ...) rather than UnexpectedEOF

Describe the bug

When an input is exhausted, the earley parser raises lark.errors.UnexpectedEOF(...), while the lalr parser raises lark.errors.UnexpectedToken('$END', ...).

For consistency sake, in lalr parsers, if the error raised from an unexpected token is '$END' it should be re-raised as UnexpectedEOF.

Some extra context

I am building an application that requires parsing a stream, and I had switched to the (much faster) lalr parser, but as my stream may require assembling several 'chunks' to create a valid record, I was catching UnexpectedEOF from earley, but now I have to catch UnexpectedToken and drill into the error to check the token:

except lark.exceptions.UnexpectedToken as err:
    if err.token == lark.Token("$END", ""):
        logger.debug("Parser expected more data, waiting for another chunk")
    else:
        raise err

To Reproduce

import sys, lark 
print(f"python: {sys.version_info}\nlark: {lark.__version__}\n\n")

grammar = 'start: "A" ~ 4'  # 4 sequential A's

try:
    lark.Lark(grammar, parser="earley").parse("AA")
except Exception as err:
    print("Earley err:", type(err), *err.args)

try:
    lark.Lark(grammar, parser="lalr").parse("AA")
except Exception as err:
    print("Lalr err:", type(err), *err.args)

Output

python: sys.version_info(major=3, minor=8, micro=5, releaselevel='final', serial=0)
lark: 0.11.1


Earley err: <class 'lark.exceptions.UnexpectedEOF'> Unexpected end-of-input. Expected one of: 
        * A

Lalr err: <class 'lark.exceptions.UnexpectedToken'> Unexpected token Token('$END', '') at line 1, column 2.
Expected one of: 
        * A

Dec 06 '20 04:12 zevisert

Note that this is something that might break compatibility. This is something we have in mind, and I think we also agree that it would be better for both parser to throw the same exception. (Note that this includes the possiblity of making the earley parser throw UnexpectedToken . But you are making a decent case to keeping UnexpectedEOF).

While this is certainly a good change, this might only happen in 1.0. (or we temporary make UnexpectedEOF behave like an UnexpectedToken. But that seems a bit hacky.)

Dec 06 '20 10:12 MegaIng

Yeah this is definitely a breaking change either way, as the different exception types can change the control flow of a program. You've seen my use case, so I would prefer both parsers to raise UnexpectedEOF. That said, there's easy workarounds here until 1.0 lands.

Thanks for the great library!

Dec 06 '20 17:12 zevisert

Yes, consistancy would make error-catching much easier

Dec 07 '20 02:12 ThatXliner

lark lark copied to clipboard

Lalr parser raises UnexpectedToken('$END', ...) rather than UnexpectedEOF

lark
lark copied to clipboard