fuzzingbook
fuzzingbook copied to clipboard
EarleyParser crash for grammars starting with an alternative
Describe the bug
When using the parsers from the book on a grammar with a start symbol having alternatives, the parser simply crashes.
This is the case with both the EarleyParser and the derived LeoParser, with the fuzzingbook package
0.8.1
.
To Reproduce
from fuzzingbook.Parser import EarleyParser
GRAMMAR = { '<start>': [ 'a', 'b' ] }
next(EarleyParser(GRAMMAR).parse('a'))
This gives the following output:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.6/dist-packages/fuzzingbook/Parser.py", line 1159, in parse
cursor, states = self.parse_prefix(text)
File "/usr/local/lib/python3.6/dist-packages/fuzzingbook/Parser.py", line 1142, in parse_prefix
self.table = self.chart_parse(text, self.start_symbol())
File "/usr/local/lib/python3.6/dist-packages/fuzzingbook/Parser.py", line 965, in chart_parse
alt = tuple(*self.cgrammar[start])
TypeError: tuple() takes at most 1 argument (2 given)
Code Fix
It seems that the EarleyParser's chart_parse
function is only expecting the start rule to have a single alternative. I think the following replacement registers all the alternatives:
def chart_parse(self, words, start):
chart = [Column(i, tok) for i, tok in enumerate([None, *words])]
for alt in self.cgrammar[start]:
chart[0].add(State(start, tuple(alt), 0, chart[0]))
return self.fill_chart(chart)
I would submit a pull request, but I think someone should also fix the narrative of the chapter to match this change if necessary.
Note the second paragraph here: https://www.fuzzingbook.org/html/Parser.html#The-Earley-Parser "Note that one restriction of our implementation is that the start symbol can have only one alternative in its alternative expressions."