lark icon indicating copy to clipboard operation
lark copied to clipboard

Token is inconsistent in py3.10 pattern matching

Open orcharddweller opened this issue 1 year ago • 12 comments

Describe the bug

Token is not working correctly with structural pattern matching.

While you can create tokens as:

t = Token(type_="TEST", value="test")

you have to pattern match on:

match t:
    case Token(type="TEST"):
        pass

As pattern matching tries to access the attributes by names provided to the pattern.

I find this solution fairly confusing and ugly, as well as hard to work with type checkers and triggers a nasty bug in pylance (https://github.com/microsoft/pylance-release/issues/3197).

To Reproduce

t = Token(type_="TEST", value="test")

match t:
    case Token(type_="TEST"):
        print("with underscore") # expected match
    case Token(type="TEST"):
        print("without underscore") # actual match

Potential solutions The simplest solution would be to rename type/type_ to be consistent between the __init__ argument and the attribute, but this might be a fairly big change to the API. There might be a possibility to add custom pattern matching logic so that matching works on Token(type_=...), but I'm not very familiar with the pattern matching API yet.

orcharddweller avatar Aug 15 '22 11:08 orcharddweller

I don't think it should be too hard to change type_ to type, as long as we add an optional type_ parameter, that overrides type when given.

I'm not very familiar with the match syntax. Would it not work to use positional arguments?

erezsh avatar Aug 15 '22 11:08 erezsh

Potentially changes that would be useful here:

  • add a property .type_ that just transpearntly referes to .type
  • change the parameter name in the __init__ (with backwards compatibitly). This will also cause linter errors since we are shadowing a builtin name
  • Add __match_args__ = ('type', 'value') to the class definition to allow positional pattern matching (if this isn't there already).
    • This might break compaitilby if you try to treat Token as a str object, haven't tested that.

MegaIng avatar Aug 15 '22 11:08 MegaIng

I'm not very familiar with the match syntax. Would it not work to use positional arguments?

It would require adding __match_args__.

  • Add __match_args__ = ('type', 'value') to the class definition to allow positional pattern matching (if this isn't there already).
    • This might break compaitilby if you try to treat Token as a str object, haven't tested that.

Based on some testing, I think it would be inconsistent, due to Token having type_ as it's first arg and str having value as it's first arg.

from typing import Any


class Token(str):
    __match_args__ = ("type", "value")

    def __new__(cls, type_, value):
        inst = super(Token, cls).__new__(cls, value)
        inst.type = type_
        inst.value = value
        return inst

t = Token("TEST", "test")

match t:
    case str("test"):
        print("a") # would match
    case _:
        pass

match t:
    case Token("test"):
        print("b") # would not match
    case _:
        pass

match t:
    case Token("TEST"):
        print("c") # would match
    case _:
        pass

After changing the order (__match_args__ = ("value", "type")), it would work relatively consistently as far as I understand. But to make it fully consistent, it would require changing arg order to Token.__new__ as well.

orcharddweller avatar Aug 15 '22 12:08 orcharddweller

No, IMO the behavior that is achived with type, value is good: You can ignore the type if you are matching as str, which is the expected behavior. I feared that case a would not behave correctly, but if it does I think we should keep it with that.

MegaIng avatar Aug 15 '22 12:08 MegaIng

Oh, that makes sense, I'll try to add a PR with those changes this week.

orcharddweller avatar Aug 15 '22 12:08 orcharddweller

Although it still might make sense to discuss which of these solutions is better:

  • add a property .type_ that just transpearntly referes to .type
  • change the parameter name in the init (with backwards compatibitly). This will also cause linter errors since we are shadowing a builtin name

orcharddweller avatar Aug 15 '22 12:08 orcharddweller

I would say that both should be done.

MegaIng avatar Aug 15 '22 12:08 MegaIng

What about the linter errors?

erezsh avatar Aug 15 '22 13:08 erezsh

What about the linter errors?

I am not sure what you mean.

MegaIng avatar Aug 15 '22 13:08 MegaIng

@MegaIng You wrote:

change the parameter name in the init (with backwards compatibitly). This will also cause linter errors since we are shadowing a builtin name

erezsh avatar Aug 15 '22 15:08 erezsh

Aha yeah. We can either tell linters to ignore that or do some trickery using **kwargs, or we can just ignore the linters.

MegaIng avatar Aug 15 '22 15:08 MegaIng

Alright. I'll wait for a PR, and we'll see.

erezsh avatar Aug 16 '22 09:08 erezsh