lark icon indicating copy to clipboard operation
lark copied to clipboard

Add EOF symbol to match end of input

Open erezsh opened this issue 7 years ago • 17 comments

Can be signified with $

Must be optional (grammars don't have to contain it to be correct)

Need to make sure it works for both LALR and Earley, and with indentation.

erezsh avatar Sep 18 '18 15:09 erezsh

Any update?

isidentical avatar Mar 17 '19 06:03 isidentical

I have a version that should be working for LALR, in branch end_symbol

If anyone wants to test it and let me know how it goes, that would be great!

(that's https://github.com/lark-parser/lark/tree/end_symbol)

@isidentical @jruales @coreyt

erezsh avatar Jul 04 '19 07:07 erezsh

I did a rebase of this onto the master branch. There was a conflict in lalr_parser.py that I'm not sure I resolved correctly.

https://github.com/jisaacstone/lark/blob/end_symbol/tests/test_parser.py#L1650-L1660

I but all the tests seem to pass. To be sure I added another test in and it failed

https://github.com/jisaacstone/lark/blob/end_symbol/tests/test_parser.py#L1650-L1660

======================================================================
ERROR: test_end_symbol2 (tests.test_parser._make_parser_test.<locals>._TestParser)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/larpy/lark/tests/test_parser.py", line 1659, in test_end_symbol2
    self.assertEqual(parser.parse('axa'), Tree('start', [Tree('a', []),Tree('a', [])]))
  File "/Users/larpy/lark/lark/lark.py", line 293, in parse
    return self.parser.parse(text, start=start)
  File "/Users/larpy/lark/lark/parser_frontends.py", line 88, in parse
    return self._parse(token_stream, start)
  File "/Users/larpy/lark/lark/parser_frontends.py", line 54, in _parse
    return self.parser.parse(input, start, *args)
  File "/Users/larpy/lark/lark/parsers/lalr_parser.py", line 36, in parse
    return self.parser.parse(*args)
  File "/Users/larpy/lark/lark/parsers/lalr_parser.py", line 110, in parse
    reduce(arg)
  File "/Users/larpy/lark/lark/parsers/lalr_parser.py", line 77, in reduce
    value = self.callbacks[rule](s)
  File "/Users/larpy/lark/lark/parse_tree_builder.py", line 28, in __call__
    res = self.node_builder(children)
  File "/Users/larpy/lark/lark/parse_tree_builder.py", line 121, in __call__
    filtered.append(children[i])
IndexError: list index out of range

----------------------------------------------------------------------

I need to dive deep to get a better understanding of what is going on here exactly, but meanwhile does anything look off about the code I linked or the test I wrote?

jisaacstone avatar Jan 18 '20 02:01 jisaacstone

The test seems fine. I think the exception happens because some of the new preprocessing code isn't aware that $ isn't a real terminal (in the sense that it doesn't match any text)

I'll look into it soon, unless you'll manage to work it out sooner.

erezsh avatar Jan 18 '20 07:01 erezsh

I've done a bit more digging

2 things that may or may not be relevant

1st: above test fails with the same error on the original branch (before rebase)

2nd: Altering the test slightly produces a different error:

        @unittest.skipIf(PARSER!='lalr', "Using the end symbol currently works for LALR only")
        def test_end_symbol2(self):
            grammar = """
                start: (a|b)+
                a: "a" (E|ND)
                b: "b"
                E: $
                ND: "x"
            """
            parser = _Lark(grammar)

            self.assertEqual(parser.parse('axa'), Tree('start', [Tree('a', []),Tree('a', [])]))
            self.assertRaises(UnexpectedInput, parser.parse, 'ab')
======================================================================
FAIL: test_end_symbol2 (tests.test_parser._make_parser_test.<locals>._TestParser)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/larpy/lark/tests/test_parser.py", line 1659, in test_end_symbol2
    parser = _Lark(grammar)
  File "/Users/larpy/lark/tests/test_parser.py", line 535, in _Lark
    return Lark(grammar, lexer=lexer_class_or_name, parser=PARSER, propagate_positions=True, **kwargs)
  File "/Users/larpy/lark/lark/lark.py", line 206, in __init__
    self.terminals, self.rules, self.ignore_tokens = self.grammar.compile(self.options.start)
  File "/Users/larpy/lark/lark/load_grammar.py", line 494, in compile
    for name, (term_tree, priority) in term_defs if term_tree]
  File "/Users/larpy/lark/lark/load_grammar.py", line 494, in <listcomp>
    for name, (term_tree, priority) in term_defs if term_tree]
  File "/Users/larpy/lark/lark/lexer.py", line 69, in __init__
    assert isinstance(pattern, Pattern), pattern
AssertionError: None

----------------------------------------------------------------------

In the rebase case this fails in a different way, because an assert has been added

https://github.com/lark-parser/lark/blob/master/lark/load_grammar.py#L637

removing the assert and it fails in the same way.

Hope these findings are helpful. About to go vacation so probably won't get back to looking at this for a couple weeks at least

jisaacstone avatar Jan 23 '20 07:01 jisaacstone

Hi @jisaacstone,

I just created a new branch, end_symbol3, which merges end_symbol with the most recent master.

I don't get the same exceptions as you. In fact, everything works perfectly for me.

I also added the two tests you outline here (renamed the 2nd version to test_end_symbol3, with a few corrections), and they all seem to pass.

Let me know if you think I missed something. Otherwise, all you have left to do, is to get it to work for Earley.

Enjoy your vacation!

erezsh avatar Jan 23 '20 10:01 erezsh

OK Looks like my original thought was correct and this section of code i did not resolve the conflicts correctly

https://github.com/jisaacstone/lark/blob/end_symbol/tests/test_parser.py#L1650-L1660

Should ought to have gone back to that when I hit trouble. Sorry for the misdirection

jisaacstone avatar Jan 25 '20 03:01 jisaacstone

+1 for $ to represent end of string. I just ran into a need for this.

jnwatson avatar Jun 11 '20 21:06 jnwatson

Just pushed this into master. You can now use $ in the LALR parser. (No Earley support yet)

Here's an example from the tests that shows what it does:

def test_end_symbol2(self):
    grammar = """
        start: (a|b)+
        a: "a" ("x"|$)
        b: "b"
    """
    parser = _Lark(grammar)

   self.assertEqual(parser.parse('axa'), Tree('start', [Tree('a', []),Tree('a', [])]))
   self.assertRaises(UnexpectedInput, parser.parse, 'ab')

erezsh avatar Jun 12 '20 06:06 erezsh

I've just checked master and I can't find test_end_symbol2 and the example grammar listed above doesn't import.

E           lark.exceptions.GrammarError: Unexpected input at line 2 column 13 in calculator/grammar.lark:
E
E           a: "a" ("x"|$)
E                       ^

lark/lark/load_grammar.py:890: GrammarError

I don't claim to be even remotely be an expert on parsing but not being able to match $ seems to make even the simplest of grammars unparseable.

For instance (simplified):

start: expr+

expr: expr OPER expr
      | SIGNED_NUMBER

OPER: "+" | "-" | "*" | "/"
parse("1-1") # Tree('start', [Tree('statement', [1.0]), Tree('statement', [-1.0])])

whereas start: (expr ("\n"|$))+ would actually pick expr oper expr

kneufeld avatar Nov 01 '20 15:11 kneufeld

Looks like you reverted in 3bee21051e8440e506ea13f45b224e2f6d668662 but you don't say why.

kneufeld avatar Nov 01 '20 16:11 kneufeld

If I remember correctly, there were some problems with unexpected side effects.

I don't claim to be even remotely be an expert on parsing but not being able to match $ seems to make even the simplest of grammars unparseable.

Maybe, but your example is wrong:

start: expr ("\n" expr?)*

works.

MegaIng avatar Nov 01 '20 16:11 MegaIng

So it does... again, not an expert, but that's very non intuitive.

Thanks for the help!

kneufeld avatar Nov 01 '20 16:11 kneufeld

@kneufeld I reverted it because it was buggy.

I agree it would be a nice feature, but I don't think it's a necessity.

erezsh avatar Nov 01 '20 16:11 erezsh

Is there any update? I kind of need this. Or is there a workaround?

ThatXliner avatar Apr 17 '21 22:04 ThatXliner

I'll see what I can do

erezsh avatar Apr 17 '21 23:04 erezsh

@ThatXliner See PR #880

erezsh avatar Apr 18 '21 03:04 erezsh