funcparserlib
funcparserlib copied to clipboard
Backtracking
It looks like funcparserlib
has no way to allow backtracking, in the manner of Parsec's try
. #43 provides one example of where you might want this. I found myself wanting it in changing Hy to parse dfor
forms like this (https://github.com/hylang/hy/issues/2324):
(dfor x (range 10) y (range 3) #(x y) (* x y))
It would be the equivalent of this Python:
{(x, y): x*y for x in range(10) for y in range(3)}
dfor
and friends in Hy have a relatively complicated parser, so here's a simplified example:
from collections import namedtuple
from funcparserlib.parser import tok, many, finished
from funcparserlib.lexer import Token
FORM = tok('FORM')
loopers = many(FORM + FORM)
lfor = loopers + FORM + finished
dfor = loopers + FORM + FORM + finished
def f(parser, ts):
return parser.parse([Token(t, 1) for t in ts.split()])
print(f(lfor, 'FORM FORM FORM FORM FORM'))
print(f(dfor, 'FORM FORM FORM FORM FORM FORM'))
Here the example with lfor
returns ([(1, 1), (1, 1)], 1, None)
, as desired, but for dfor
, I get funcparserlib.parser.NoParseError: got unexpected end of input, expected: FORM
. The problem is that loopers
consumes all the FORM FORM
pairs and doesn't try moving back one iteration of many
to give the final FORM + FORM + finished
in dfor
a chance to match.
See also the docs on try
in Parsec.
Note: This Markdown reply is executable via python3 -m doctest reply-text.md
.
Imports:
>>> from funcparserlib.parser import a, many, finished, forward_decl, Parser, pure
The alternative <|>
combinator in Parsec always uses the look ahead of 1. Therefore you have to use try
there if you need an arbitrary look ahead. In contrast to that, the alternative |
combinator in funcparserlib always uses an arbitrary look ahead. Here is an example with the look ahead of 2, but it's really arbitrary:
>>> x = a("x")
>>> y = a("y")
>>> z = a("z")
>>> xxy_or_xxz = x + x + y | x + x + z
>>> xxy_or_xxz.parse("xxy")
('x', 'x', 'y')
>>> xxy_or_xxz.parse("xxz")
('x', 'x', 'z')
@Kodiologist Could you please share how exacly you would have fixed the problem in your example by using Parsec's try
? Then you and I could think of either a workaround or a feature request for funcparserlib.
many()
is greedy, like *
in regexps. It consumes all matching tokens and always succeeds:
>>> x = a("x")
>>> many_x = many(x)
>>> many_x.parse("xxx")
['x', 'x', 'x']
>>> many_x.parse("y")
[]
The problem with the greedy *
in x*xx
implemented via many()
is that it consumes all the tokens and the +
combinator doesn't do any backtracking:
>>> x = a("x")
>>> many_x_then_x = many(x) + x + x
>>> many_x_then_x.parse("xxx")
Traceback (most recent call last):
...
funcparserlib.parser.NoParseError: got unexpected end of input, expected: 'x'
I guess that you, in fact, want a non-greedy version of many()
. A sort of the *?
operator in regexps in addition to *
, right? A regexp equivalent of this would be x*?xx
. Then you could have written something like this:
>>> x = a("x")
>>> def non_greedy_many(p):
... # Fake implementation for a single case
... return a("x") >> list
>>> many_x_then_x = non_greedy_many(x) + x + x
>>> many_x_then_x.parse("xxx")
(['x'], 'x', 'x')
The problem is that I currenlty have no idea how to implement it easily, since +
in funcparserlib doesn't support any backtracking.
Workaround 1. Extract the last x
, x
at the tree transformation level:
>>> x = a("x")
>>> def to_results(args):
... x1, x2, xs = args
... combined = [x1] + [x2] + xs
... *rest, last1, last2 = combined
... return rest, last1, last2
>>> many_x_then_x = x + x + many(x) >> to_results
>>> many_x_then_x.parse("xxx")
(['x'], 'x', 'x')
>>> many_x_then_x.parse("xx")
([], 'x', 'x')
>>> many_x_then_x.parse("x")
Traceback (most recent call last):
...
funcparserlib.parser.NoParseError: got unexpected end of input, expected: 'x'
>>> many_x_then_x.parse("")
Traceback (most recent call last):
...
funcparserlib.parser.NoParseError: got unexpected end of input, expected: 'x'
It's been a long time (over a decade) since I touched Parsec and so perhaps I was wrong in thinking it can be done with try
. But yes, a non-greedy repetition operator fits the bill here.