lark
lark copied to clipboard
Token is inconsistent in py3.10 pattern matching
Describe the bug
Token
is not working correctly with structural pattern matching.
While you can create tokens as:
t = Token(type_="TEST", value="test")
you have to pattern match on:
match t:
case Token(type="TEST"):
pass
As pattern matching tries to access the attributes by names provided to the pattern.
I find this solution fairly confusing and ugly, as well as hard to work with type checkers and triggers a nasty bug in pylance (https://github.com/microsoft/pylance-release/issues/3197).
To Reproduce
t = Token(type_="TEST", value="test")
match t:
case Token(type_="TEST"):
print("with underscore") # expected match
case Token(type="TEST"):
print("without underscore") # actual match
Potential solutions
The simplest solution would be to rename type
/type_
to be consistent between the __init__
argument and the attribute, but this might be a fairly big change to the API. There might be a possibility to add custom pattern matching logic so that matching works on Token(type_=...)
, but I'm not very familiar with the pattern matching API yet.
I don't think it should be too hard to change type_
to type
, as long as we add an optional type_
parameter, that overrides type
when given.
I'm not very familiar with the match syntax. Would it not work to use positional arguments?
Potentially changes that would be useful here:
- add a property
.type_
that just transpearntly referes to.type
- change the parameter name in the
__init__
(with backwards compatibitly). This will also cause linter errors since we are shadowing a builtin name - Add
__match_args__ = ('type', 'value')
to the class definition to allow positional pattern matching (if this isn't there already).- This might break compaitilby if you try to treat
Token
as astr
object, haven't tested that.
- This might break compaitilby if you try to treat
I'm not very familiar with the match syntax. Would it not work to use positional arguments?
It would require adding __match_args__
.
- Add
__match_args__ = ('type', 'value')
to the class definition to allow positional pattern matching (if this isn't there already).
- This might break compaitilby if you try to treat Token as a str object, haven't tested that.
Based on some testing, I think it would be inconsistent, due to Token having type_
as it's first arg and str
having value
as it's first arg.
from typing import Any
class Token(str):
__match_args__ = ("type", "value")
def __new__(cls, type_, value):
inst = super(Token, cls).__new__(cls, value)
inst.type = type_
inst.value = value
return inst
t = Token("TEST", "test")
match t:
case str("test"):
print("a") # would match
case _:
pass
match t:
case Token("test"):
print("b") # would not match
case _:
pass
match t:
case Token("TEST"):
print("c") # would match
case _:
pass
After changing the order (__match_args__ = ("value", "type")
), it would work relatively consistently as far as I understand. But to make it fully consistent, it would require changing arg order to Token.__new__
as well.
No, IMO the behavior that is achived with type, value
is good: You can ignore the type if you are matching as str
, which is the expected behavior. I feared that case a
would not behave correctly, but if it does I think we should keep it with that.
Oh, that makes sense, I'll try to add a PR with those changes this week.
Although it still might make sense to discuss which of these solutions is better:
- add a property .type_ that just transpearntly referes to .type
- change the parameter name in the init (with backwards compatibitly). This will also cause linter errors since we are shadowing a builtin name
I would say that both should be done.
What about the linter errors?
What about the linter errors?
I am not sure what you mean.
@MegaIng You wrote:
change the parameter name in the init (with backwards compatibitly). This will also cause linter errors since we are shadowing a builtin name
Aha yeah. We can either tell linters to ignore that or do some trickery using **kwargs
, or we can just ignore the linters.
Alright. I'll wait for a PR, and we'll see.