lark icon indicating copy to clipboard operation
lark copied to clipboard

Terminals cannot be transformed in tree-less LALR?

Open qwertystop opened this issue 4 years ago • 5 comments

Describe the bug

I've written a transformer with the line DECIMAL = float (from %import common.DECIMAL in the grammar) and it seemed to work fine post-transforming an Earley-parsed tree – that is, the relevant locations in the transformed tree would be 48.0 rather than Token('DECIMAL', '48.0') and similar.

I've written another transformer, this one for a grammar that I was able to make run in LALR. Therefore, I tried a tree-less parse. Now I get ValueError: Callbacks must return a token (returned 48.0). To make the grammar work for a tree-less parse, I must wrap all my terminals in rules just to be able to transform them.

This seems to me unnecessarily tedious, and expands the size of the grammar unhelpfully. Also, transformer callbacks for rules are more long-winded to write – you can't do decimal = float, at minimum you have to do decimal = lambda self, n: return float(n[0])

To Reproduce

from lark import Lark, Transformer
grammar = """
    start: NUM
    NUM: /\d+/
"""

class T(Transformer):
    NUM = int

T().transform(Lark(grammar, parser="lalr").parse("32"))
# returns Tree('start', [32])
Lark(grammar, parser="lalr", transformer=T()).parse("32")
# raises ValueError("Callbacks must return a token (returned 32)")

qwertystop avatar Jan 25 '21 21:01 qwertystop

@erezsh The problem is that the TERMINAL callbacks get added to lexer_callbacks, which of course happens before the parser see them. This also means the Transformer can actually change how to parser deals with the terminals, since it can change their type. This also means the this will be a backwards incompatible fix. I am not even sure if there is a clean way of fixing this.

MegaIng avatar Jan 25 '21 22:01 MegaIng

I don't think there's a way to fix this that makes sense.

We can process the terminals after the parser, but the user can do that manually as well, and it's better to be transparent about this.

erezsh avatar Jan 26 '21 00:01 erezsh

I suppose another approach would be to replace the return values with a proxy object, that also maintains the token type. But that sounds a little too "magical".

erezsh avatar Jan 26 '21 00:01 erezsh

After looking at the code, I think it might make sense to handle the Terminal Transformer like we do the rules transformer: Here similar to what happens a few line later. We would have to extend the callback generation to also add the Terminals, but that shouldn't be to hard.

MegaIng avatar Jan 26 '21 01:01 MegaIng

@MegaIng You could be right, that might work.

erezsh avatar Jan 26 '21 02:01 erezsh