sly icon indicating copy to clipboard operation
sly copied to clipboard

Combining Lexer Match Actions and Token Remapping

Open zvr opened this issue 4 years ago • 2 comments

From the examples, one can have actions when a lexical rules matches:

    @_(r'\d+')
    def NUMBER(self, t):
        t.value = int(t.value)   # Convert to a numeric value
        return t

One can also remap tokens:

    ID = r'[a-zA-Z_][a-zA-Z0-9_]*'
    ID['if'] = IF
    ID['else'] = ELSE

These cannot be combined, since if you define a function to perform an action, the next remap attempt raises an error:

TypeError: 'function' object does not support item assignment

What is the recommended way to use both of these techniques in a lexical token?

I assume the function could examine the value of the match (say, the string in ID) with something like if t.value == 'if', but how to return a different token?

zvr avatar May 31 '21 11:05 zvr

The two techniques can't be combined. In fact, the whole token remapping feature was meant to replace the need for writing a function like this (which was commonplace):

keywords = { 'if', 'else', 'while' }

@_(r'[a-zA-Z_][a-zA-Z0-9_]*')
def ID(self, t):
    if t.value in keywords:
        t.type = t.value.upper()
    return t

As shown in the function, the token type can be changed by assigning a different value to t.type.

dabeaz avatar May 31 '21 16:05 dabeaz

Great, thanks!

zvr avatar May 31 '21 16:05 zvr