lark icon indicating copy to clipboard operation
lark copied to clipboard

Can you have a Transformer or lexer_callback that replaces the output of one rule with multiple values?

Open rfox12-edu opened this issue 4 years ago • 24 comments

What is your question?

Try to be accurate and concise.

If you're having trouble with your code or grammar

Provide a small script that encapsulates your issue.

Explain what you're trying to do, and what is obstructing your progress.

rfox12-edu avatar Nov 17 '20 20:11 rfox12-edu

I have a rule that basically does: number_over_number: NUMBER FORWARD_SLASH NUMBER

Then I need to validate that this number is a fraction. If it is "1/2" I've been able to use a Transformer with success to replace the node in the Tree.

If, however, my number_over_number node is "5/3" I would like to return more than a single node... i.e. "restore" the parent structure such that my output lists STUFF BEFORE NUMBER FORWARD_SLASH NUMBER STUFF AFTER

I hope this is clear. In short: is there a way to achieve some validation on a rule, and if that validation cause the rule match to fail?

rfox12-edu avatar Nov 17 '20 20:11 rfox12-edu

Like.. preprocessing? You can replace FORWARD_SLASH with "/" if you don't want it

ThatXliner avatar Nov 17 '20 20:11 ThatXliner

How could I validate that the first number is less than the second number? I've been parsing and then post-processing...

rfox12-edu avatar Nov 17 '20 20:11 rfox12-edu

I think you can first have a generic fraction rule. Then, when transforming, determine whether to change it to an improper or proper rule. Personally, I don't you should do this if you want to promote nested expressions

ThatXliner avatar Nov 17 '20 20:11 ThatXliner

Ok here is my complete example:

import logging
from lark import Lark, Transformer, Token, Tree, logger
from lark.visitors import Transformer_InPlace

identifier_grammar = '''
    start: token+
    token: NUMBER -> number
        | FORWARD_SLASH -> forward_slash
        | _number_over_number -> number_over_number
        | UNASSIGNED -> unassigned
    
    FORWARD_SLASH: "/"
    _number_over_number: NUMBER FORWARD_SLASH NUMBER    // TODO: consider whitespace?
    UNASSIGNED.-2: /\S+/
    
    %import common.NUMBER
    %import common.WS
    %ignore WS
'''

p = Lark(identifier_grammar, parser='lalr', debug=True)
print('case 1 -----------------')
print(p.parse("1 and 2 and /").pretty())
print('case 2 -----------------')
print(p.parse("1/2").pretty())
print('case 3 -----------------')
print(p.parse("5/2").pretty())

And the output is:

case 1 -----------------
start
  number	1
  unassigned	and
  number	2
  unassigned	and
  forward_slash	/

case 2 -----------------
start
  number_over_number
    1
    /
    2

case 3 -----------------
start
  number_over_number
    5
    /
    2

For "case 3" ideally I would like to see this instead:

start
  number	5
  forward_slash	/
  number	2

But when I use a Transformer on number_over_number I can only replace that single node... not re-insert 3 nodes to the parent.. does that make sense?

rfox12-edu avatar Nov 17 '20 20:11 rfox12-edu

Basically my number_over_number is my potential fraction that I'm validating on the back end. If it doesn't turn out to be a fraction then I'd like to restore the tree (as if the rule did not match)

rfox12-edu avatar Nov 17 '20 20:11 rfox12-edu

return node, maybe

ThatXliner avatar Nov 17 '20 20:11 ThatXliner

If I understand your question, then the answer is: Not really.

Would it help if you could do something like this?

# pseudo-code, doesn't work yet
from lark import Inline

class MyTransformer(Transformer):
   ...
   def my_rule(self, children):
       return Inline(children)	# inlines my_rule, by replacing it with its children

erezsh avatar Nov 17 '20 21:11 erezsh

Yes! That would be exactly what I need. It would make my code very nice to read.

rfox12-edu avatar Nov 17 '20 21:11 rfox12-edu

I implemented it in the inline branch (see linked commit).

Let me know if that works for you the way you expect.

erezsh avatar Nov 17 '20 21:11 erezsh

Why can't one just inline it in the grammar?

ThatXliner avatar Nov 17 '20 22:11 ThatXliner

@ThatXliner Because you don't always know at the grammar level. Sometimes it depends on the values (i.e. context-sensitive)

erezsh avatar Nov 17 '20 22:11 erezsh

First of all let me say THANK YOU for the fast response! It took me a bit to clone from github in a way that cooperated with my conda environment but I managed it. This is my code and the inline object is getting created but something still isn't right:

import logging
from lark import Lark, Transformer, Token, Tree, logger
from lark.visitors import Transformer_InPlace, Transformer_InPlaceRecursive

identifier_grammar = '''
    start: token+
    token: NUMBER -> number
        | FORWARD_SLASH -> forward_slash
        | _number_over_number -> number_over_number
        | UNASSIGNED -> unassigned
    
    FORWARD_SLASH: "/"
    _number_over_number: NUMBER FORWARD_SLASH NUMBER    // TODO: consider whitespace?
    UNASSIGNED.-2: /\S+/
    
    %import common.NUMBER
    %import common.WS
    %ignore WS
'''

class T1(Transformer_InPlace):
    def number_over_number(self, tok):
        pprint('==TRANSFORMER==')
        pprint(tok.children)
        return Inline(tok.children)
    

p1 = Lark(identifier_grammar, parser='lalr', debug=True, transformer=T1())


print('Transformer_InPlace -----------------')
print(p1.parse("1/2").pretty())

output is:

Transformer_InPlace -----------------
'==TRANSFORMER=='
[Token('NUMBER', '1'), Token('FORWARD_SLASH', '/'), Token('NUMBER', '2')]
start	<lark.visitors.Inline object at 0x7f06fc1a7ac0>

rfox12-edu avatar Nov 17 '20 23:11 rfox12-edu

in a way that cooperated with my conda environment

I wonder if pyenv could help you get out of that mess 😉

ThatXliner avatar Nov 17 '20 23:11 ThatXliner

I think that just means the Inline object has no repr

ThatXliner avatar Nov 17 '20 23:11 ThatXliner

It isn't yet implemented for the transformerparameter.

MegaIng avatar Nov 17 '20 23:11 MegaIng

The new code seems to work fine--but there is one thing that really threw me for a loop in using a Transformer. If you specify a transformer in the Lark constructor: p1 = Lark(identifier_grammar, parser='lalr', debug=True, transformer=T1(visit_tokens=False)) it will get a Tree object passed to the defined functions. If you construct Lark without your Transformer (same transformer code) and then use it later: T1(visit_tokens=False).transform(xxx) Then the transformer will be passed the children list Not sure if that is expected behavior

rfox12-edu avatar Nov 18 '20 04:11 rfox12-edu

it will get a Tree object passed to the defined functions

That's weird. You can see in the json_parser.py example that it's not the case.

Are you sure you are testing it correctly?

erezsh avatar Nov 18 '20 06:11 erezsh

Here is my complete test on version 10 of lark-parser:

import logging
from pprint import pprint
from lark import Lark, Transformer, Token, Tree, logger
from lark.visitors import Transformer_InPlace

identifier_grammar = '''
    start: token+
    token: NUMBER -> number
        | FORWARD_SLASH -> forward_slash
        | _number_over_number -> number_over_number
        | UNASSIGNED -> unassigned
    
    FORWARD_SLASH: "/"
    _number_over_number: NUMBER FORWARD_SLASH NUMBER    // TODO: consider whitespace?
    UNASSIGNED.-2: /\S+/
    
    %import common.NUMBER
    %import common.WS
    %ignore WS
'''

class T1(Transformer_InPlace):
    def number_over_number(self, tok):
        if isinstance(tok, Tree):
            print('Got the tree')
        elif isinstance(tok, list):
            print('Got the list')
        return tok
    
p = Lark(identifier_grammar, parser='lalr', debug=True)
p1 = Lark(identifier_grammar, parser='lalr', debug=True, transformer=T1(visit_tokens=False))

then: print(p.parse('try 1/2 thing').pretty())

start
  unassigned	try
  number_over_number
    1
    /
    2
  unassigned	thing

print(p1.parse("first 1/2 and").pretty())

Got the tree
start
  unassigned	first
  number_over_number
    1
    /
    2
  unassigned	and

print(T1(visit_tokens=False).transform(p.parse("first 1/2 and")).pretty())

Got the list
start
  unassigned	first
  [Token('NUMBER', '1'), Token('FORWARD_SLASH', '/'), Token('NUMBER', '2')]
  unassigned	and

I actually prefer to get the Tree--gives me more flexibility. And look at the output.. an "empty" Transformer actually changes the structure of the tree! That seems like a bug for sure.

rfox12-edu avatar Nov 18 '20 12:11 rfox12-edu

If you want to get the Tree use @v_args(tree=True). It is a bug that you are getting the Tree at first I think. And no, it isn't an empty Transformer. For that you function would have to behave like __default__, which it doesn't.

MegaIng avatar Nov 18 '20 12:11 MegaIng

Also, when used as an embedded Transformer, there isn't a difference between In-place and the normal Transformer (expect this Bug)

MegaIng avatar Nov 18 '20 13:11 MegaIng

@MegaIng I see your point. But I suppose I would prefer that returning the input would yield "no change". I suppose you would need to pass in all of the same parameters (default gets more info than the user functions right?). Maybe something more clearly documented that the transformer destroys the Tree object if you handle the rule. You can use my code above to help document if you want. Thanks for the help.

rfox12-edu avatar Nov 18 '20 13:11 rfox12-edu

All that default does it create a Tree object. You can do that yourself:

from lark import Tree

def number_over_number(children):
    return Tree('number_over_number', children)

The only think that __default__ does more is to use the meta parameter as well to not lose the position information. You can do this yourself with @v_args(meta=True).

But I suppose I would prefer that returning the input would yield "no change".

Why would you define a Transformer method then? If you just want to observer, use Visitor. And if you really want to do that with the Transformer, use @v_args(tree=True). Then you can simply return the argument and it will represent zero change.

MegaIng avatar Nov 18 '20 15:11 MegaIng

I see you points thank you. I would say 1) sometimes you want the option not to change (like in my example) and 2) creating / destroying objects every time is not computationally necessary or the most performant.

rfox12-edu avatar Nov 18 '20 17:11 rfox12-edu