parsimonious
parsimonious copied to clipboard
Can't get full parse tree without consolidations
I'd like to parse with a flag (or something of that nature) that results in a FULL parse tree with no consolidations.
By consolidation I mean what occurs in this example:
default_rule = foo
foo = bar
bar = "fizz"
If you parse the string "fizz"
with a grammar formed from this PEG your node tree will not contain a single foo
or default_rule
node as far as I can tell.
The output is this (if I got it right):
<Node matching "fizz">
<Node called "bar" matching "fizz">
There are also possibly more nodes being missed but I'm less desperate to access them (but I think there should be a flag for them too, either a separate one or included as part of the previously mentioned flag).
foo
could have important semantic meaning that is lost, or a visit_foo
and this will mean it won't get called (this is the case for my program where I want to highlight all foo
s with a certain colour but not bar
s except indirectly when in foo
s).
I attempted to find where the code does this consolidation but the closest I could find was Node_Visitor.lift_child
but overriding that seemed to have no effect and I couldn't see it being used anywhere.
A work around is this:
default_rule = foo ""
foo = bar ""
bar = "fizz"
Parsing fizz
we get:
<Node matching "fizz">
<Node called "default_rule" matching "fizz">
<Node called "foo" matching "fizz">
<Node called "bar" matching "fizz">
<Node matching "">
<Node matching "">
<Node matching "">
I get the nodes I want, but unfortunately get some useless ones as well.
The docstring of Grammar
mentions:
https://github.com/erikrose/parsimonious/blob/3da7e804c07d4e495873be208701b5c955247c58/parsimonious/grammar.py#L44-L46
I can't spot the exact place where that optimization takes place either. As noted in the docstring,
https://github.com/erikrose/parsimonious/blob/3da7e804c07d4e495873be208701b5c955247c58/parsimonious/grammar.py#L38-L40
which means you can write your own parser and solve this issue. However, there's a little hack to get exactly what you want with no work:
>>> g = Grammar(
... r"""
... foo = bar / tag_this
... bar = "fizz"
... tag_this = !"" "" # Never matches, useful for ensuring rule shows up in tree
... """
... )
>>> print(g.parse("fizz"))
<Node called "foo" matching "fizz">
<Node called "bar" matching "fizz">