lark
lark copied to clipboard
Can you have a Transformer or lexer_callback that replaces the output of one rule with multiple values?
What is your question?
Try to be accurate and concise.
If you're having trouble with your code or grammar
Provide a small script that encapsulates your issue.
Explain what you're trying to do, and what is obstructing your progress.
I have a rule that basically does:
number_over_number: NUMBER FORWARD_SLASH NUMBER
Then I need to validate that this number is a fraction. If it is "1/2" I've been able to use a Transformer with success to replace the node in the Tree.
If, however, my number_over_number
node is "5/3" I would like to return more than a single node... i.e. "restore" the parent structure such that my output lists
STUFF BEFORE
NUMBER
FORWARD_SLASH
NUMBER
STUFF AFTER
I hope this is clear. In short: is there a way to achieve some validation on a rule, and if that validation cause the rule match to fail?
Like.. preprocessing? You can replace FORWARD_SLASH
with "/"
if you don't want it
How could I validate that the first number is less than the second number? I've been parsing and then post-processing...
I think you can first have a generic fraction
rule. Then, when transforming, determine whether to change it to an improper
or proper
rule. Personally, I don't you should do this if you want to promote nested expressions
Ok here is my complete example:
import logging
from lark import Lark, Transformer, Token, Tree, logger
from lark.visitors import Transformer_InPlace
identifier_grammar = '''
start: token+
token: NUMBER -> number
| FORWARD_SLASH -> forward_slash
| _number_over_number -> number_over_number
| UNASSIGNED -> unassigned
FORWARD_SLASH: "/"
_number_over_number: NUMBER FORWARD_SLASH NUMBER // TODO: consider whitespace?
UNASSIGNED.-2: /\S+/
%import common.NUMBER
%import common.WS
%ignore WS
'''
p = Lark(identifier_grammar, parser='lalr', debug=True)
print('case 1 -----------------')
print(p.parse("1 and 2 and /").pretty())
print('case 2 -----------------')
print(p.parse("1/2").pretty())
print('case 3 -----------------')
print(p.parse("5/2").pretty())
And the output is:
case 1 -----------------
start
number 1
unassigned and
number 2
unassigned and
forward_slash /
case 2 -----------------
start
number_over_number
1
/
2
case 3 -----------------
start
number_over_number
5
/
2
For "case 3" ideally I would like to see this instead:
start
number 5
forward_slash /
number 2
But when I use a Transformer on number_over_number I can only replace that single node... not re-insert 3 nodes to the parent.. does that make sense?
Basically my number_over_number
is my potential fraction that I'm validating on the back end. If it doesn't turn out to be a fraction then I'd like to restore the tree (as if the rule did not match)
return node
, maybe
If I understand your question, then the answer is: Not really.
Would it help if you could do something like this?
# pseudo-code, doesn't work yet
from lark import Inline
class MyTransformer(Transformer):
...
def my_rule(self, children):
return Inline(children) # inlines my_rule, by replacing it with its children
Yes! That would be exactly what I need. It would make my code very nice to read.
I implemented it in the inline
branch (see linked commit).
Let me know if that works for you the way you expect.
Why can't one just inline it in the grammar?
@ThatXliner Because you don't always know at the grammar level. Sometimes it depends on the values (i.e. context-sensitive)
First of all let me say THANK YOU for the fast response! It took me a bit to clone from github in a way that cooperated with my conda environment but I managed it. This is my code and the inline object is getting created but something still isn't right:
import logging
from lark import Lark, Transformer, Token, Tree, logger
from lark.visitors import Transformer_InPlace, Transformer_InPlaceRecursive
identifier_grammar = '''
start: token+
token: NUMBER -> number
| FORWARD_SLASH -> forward_slash
| _number_over_number -> number_over_number
| UNASSIGNED -> unassigned
FORWARD_SLASH: "/"
_number_over_number: NUMBER FORWARD_SLASH NUMBER // TODO: consider whitespace?
UNASSIGNED.-2: /\S+/
%import common.NUMBER
%import common.WS
%ignore WS
'''
class T1(Transformer_InPlace):
def number_over_number(self, tok):
pprint('==TRANSFORMER==')
pprint(tok.children)
return Inline(tok.children)
p1 = Lark(identifier_grammar, parser='lalr', debug=True, transformer=T1())
print('Transformer_InPlace -----------------')
print(p1.parse("1/2").pretty())
output is:
Transformer_InPlace -----------------
'==TRANSFORMER=='
[Token('NUMBER', '1'), Token('FORWARD_SLASH', '/'), Token('NUMBER', '2')]
start <lark.visitors.Inline object at 0x7f06fc1a7ac0>
in a way that cooperated with my conda environment
I wonder if pyenv could help you get out of that mess 😉
I think that just means the Inline object has no repr
It isn't yet implemented for the transformer
parameter.
The new code seems to work fine--but there is one thing that really threw me for a loop in using a Transformer. If you specify a transformer in the Lark constructor:
p1 = Lark(identifier_grammar, parser='lalr', debug=True, transformer=T1(visit_tokens=False))
it will get a Tree object passed to the defined functions.
If you construct Lark without your Transformer (same transformer code) and then use it later:
T1(visit_tokens=False).transform(xxx)
Then the transformer will be passed the children list
Not sure if that is expected behavior
it will get a Tree object passed to the defined functions
That's weird. You can see in the json_parser.py
example that it's not the case.
Are you sure you are testing it correctly?
Here is my complete test on version 10 of lark-parser:
import logging
from pprint import pprint
from lark import Lark, Transformer, Token, Tree, logger
from lark.visitors import Transformer_InPlace
identifier_grammar = '''
start: token+
token: NUMBER -> number
| FORWARD_SLASH -> forward_slash
| _number_over_number -> number_over_number
| UNASSIGNED -> unassigned
FORWARD_SLASH: "/"
_number_over_number: NUMBER FORWARD_SLASH NUMBER // TODO: consider whitespace?
UNASSIGNED.-2: /\S+/
%import common.NUMBER
%import common.WS
%ignore WS
'''
class T1(Transformer_InPlace):
def number_over_number(self, tok):
if isinstance(tok, Tree):
print('Got the tree')
elif isinstance(tok, list):
print('Got the list')
return tok
p = Lark(identifier_grammar, parser='lalr', debug=True)
p1 = Lark(identifier_grammar, parser='lalr', debug=True, transformer=T1(visit_tokens=False))
then:
print(p.parse('try 1/2 thing').pretty())
start
unassigned try
number_over_number
1
/
2
unassigned thing
print(p1.parse("first 1/2 and").pretty())
Got the tree
start
unassigned first
number_over_number
1
/
2
unassigned and
print(T1(visit_tokens=False).transform(p.parse("first 1/2 and")).pretty())
Got the list
start
unassigned first
[Token('NUMBER', '1'), Token('FORWARD_SLASH', '/'), Token('NUMBER', '2')]
unassigned and
I actually prefer to get the Tree--gives me more flexibility. And look at the output.. an "empty" Transformer actually changes the structure of the tree! That seems like a bug for sure.
If you want to get the Tree use @v_args(tree=True)
. It is a bug that you are getting the Tree at first I think. And no, it isn't an empty Transformer. For that you function would have to behave like __default__
, which it doesn't.
Also, when used as an embedded Transformer, there isn't a difference between In-place and the normal Transformer (expect this Bug)
@MegaIng I see your point. But I suppose I would prefer that returning the input would yield "no change". I suppose you would need to pass in all of the same parameters (default gets more info than the user functions right?). Maybe something more clearly documented that the transformer destroys the Tree object if you handle the rule. You can use my code above to help document if you want. Thanks for the help.
All that default does it create a Tree
object. You can do that yourself:
from lark import Tree
def number_over_number(children):
return Tree('number_over_number', children)
The only think that __default__
does more is to use the meta
parameter as well to not lose the position information. You can do this yourself with @v_args(meta=True)
.
But I suppose I would prefer that returning the input would yield "no change".
Why would you define a Transformer method then? If you just want to observer, use Visitor
. And if you really want to do that with the Transformer, use @v_args(tree=True)
. Then you can simply return the argument and it will represent zero change.
I see you points thank you. I would say 1) sometimes you want the option not to change (like in my example) and 2) creating / destroying objects every time is not computationally necessary or the most performant.