lark
lark copied to clipboard
fix: Token now pattern matches correctly
I've added fixes as per https://github.com/lark-parser/lark/issues/1180#issuecomment-1214921614. I don't think that the backward compatibility is possible in this case, as the original argument is positional, e.g.
Token(type_="TEST", value="test") # this will break no matter what, as positional arguments are not set properly
Token("TEST", "test") # this doesn't require any backwards compatibility changes
Please correct me if I'm missing something.
I've also added a bunch of tests (which only are run if the version is 3.10 or higher). I wasn't sure how to use assert statements in structural pattern matching, but it does the job.
closes #1180
Also, the update method should also be updated with the changed parameter name.
Fixed, also added setter for .type_
.
It is possible to provide backwards compatibility by not making type and value be required positional arguments, but instead use *args and **kwargs and have the actual supported signature(s) be only present in @overload declarations. But I am not sure if this should be done, you are correct that this is more work than I originally thought.
Please let me know if you think this should be added, I'm happy to figure it out.
I was thinking we'll just add a new param, type_=None
, and change inst.type = type
to inst.type = type_ if type_ is not None else type
The downside is we'll have to set type and value to None too. But it doesn't break anything.
Btw, why not just do __match_args__ = __slots__ = (...)
?
P.S. we'll also need a deprecation warning for any use of type_
.
I was thinking we'll just add a new param, type_=None, and change inst.type = type to inst.type = type_ if type_ is not None else type
That would make the typing inconsistent between new
and the parameters.
The args thing seems to be ok.
Btw, why not just do match_args = slots = (...) ?
Does order of __slots__
matter? Because I think that __match_args__
should be in the same order as args to new
and update
P.S. we'll also need a deprecation warning for any use of type_
On it.
Hey,
I've figured out that the easiest way to use *args
and **kwargs
with @overload
, was to create methods for the old and new signatures and call them from the overloaded one.
Otherwise, I would have to manually write all the parsing of *args
and **kwargs
including errors etc. which would've been a huge mess.
Also, this way will allow you to remove the deprecated code in the future easily.
One caveat is that the errors will contain one of the wrapped functions names (still should be easily readable), e.g.:
TypeError: Token._deprecated_new() got multiple values for argument 'type_'
Also DeprecationWarning
s are supressed by default. I'm not sure if you want to change this, so I left it as it is.
Also, I'm pretty sure it would be nice to add proper typings to Token.__new__
and to fix the typings for Token
(most of the stuff there should be Optional
), but it's not really in scope of the PR.
Why did you write
def __new__(cls, *args, **kwargs):
if "type_" in kwargs:
return cls._deprecated_new(*args, **kwargs)
return cls._future_new(*args, **kwargs)
Instead of just - ?
def __new__(cls, *args, **kwargs):
if "type_" in kwargs:
kwargs["type"] = kwargs["type_"]
return cls.same_new(*args, **kwargs)
Also DeprecationWarnings are supressed by default.
What do you mean?
most of the stuff there should be Optional
That's true! We should fix that.
P.S.
Earlier you wrote about adding a type_
parameters.
That would make the typing inconsistent between new and the parameters.
Can you please explain that a little bit more?
Why did you write
def __new__(cls, *args, **kwargs): if "type_" in kwargs: return cls._deprecated_new(*args, **kwargs) return cls._future_new(*args, **kwargs)
Instead of just - ?
def __new__(cls, *args, **kwargs): if "type_" in kwargs: kwargs["type"] = kwargs["type_"] return cls.same_new(*args, **kwargs)
Both should be fine, but in the first one, errors will be more readable:
TypeError: Token._deprecated_new() got multiple values for argument 'type_'
makes more sense if you put in something like Token("a", "A", type_="a")
, as opposed to (which would be the error in your version):
TypeError: Token._same_new() got multiple values for argument 'type'
It's supposed to be removed, so I wouldn't worry too much about code duplication.
Also DeprecationWarnings are supressed by default.
What do you mean?
The warnings don't really show until we change the configs: https://stackoverflow.com/a/20960427/7416999
At least they didn't show to me, when I tested this.
Earlier you wrote about adding a type_ parameters.
That would make the typing inconsistent between new and the parameters.
Can you please explain that a little bit more?
Unless we want to allow Token.type
and Token.value
to be optional, which would be a relatively big, and unnecessary change (as the @overload
solution works), having type
and value
as optional in __new__
would lead to undefined behavior or require us to add runtime errors like TypeError("at least one of type and type_ has not to be None")
. Having types properly statically checked might save some errors from happening on runtime.
That's true! We should fix that.
Fixed types and added typings to Token.__new__
.
errors will be more readable:
That's a bit of a nitpick. I can easily add this, and get an even better error -
def __new__(cls, *args, **kwargs):
if "type_" in kwargs:
if "type" in kwargs:
raise TypeError("Error: using both 'type' and the deprecated 'type_' as arguments.")
kwargs["type"] = kwargs.pop("type_")
return cls.same_new(*args, **kwargs)
The warnings don't really show until we change the configs:
That link is from 2014. But when I test it in Python 3.10 -
>>> import warnings
>>> warnings.warn('test', DeprecationWarning)
<stdin>:1: DeprecationWarning: test
errors will be more readable:
That's a bit of a nitpick. I can easily add this, and get an even better error -
def __new__(cls, *args, **kwargs): if "type_" in kwargs: if "type" in kwargs: raise TypeError("Error: using both 'type' and the deprecated 'type_' as arguments.") kwargs["type"] = kwargs.pop("type_") return cls.same_new(*args, **kwargs)
I don't know if this is necessarily a better error, as it does nothing to address my use case (which while valid, I admit is really minor). It's just a different error.
Nevertheless, I've added it, as I believe it makes a lot of sense here.
As I've said, both versions are fine to me, so I've changed to the one you prefer.
That link is from 2014. But when I test it in Python 3.10 -
>>> import warnings >>> warnings.warn('test', DeprecationWarning) <stdin>:1: DeprecationWarning: test
That link is 100% valid still. Please look at python docs regarding warnings. Also, you can test it by importing from a module that warns DeprecationWarning. As far as I understand, warning filters are different in the interactive shell.
The issue can be solved by adding:
warnings.simplefilter('default', DeprecationWarning)
to the top of this file, but it breaks the standalone version, which I really don't understand. So, I'm leaving this the way it is.
Thanks!
I see, you're right, the behavior changes for imported modules. I'll look into adding it.
In regards to Deprecation Warnings: They should be silenced by default, if someone cares they need to add the call to simplefilter
themselves. We shouldn't do that in the library.
@marcinplatek Thank you for your contribution!
Sorry it took so long to review and merge,