lark
lark copied to clipboard
equality of Tokens and strings is not transitive
Describe the bug
The __eq__ implementation for Token gives non-transitive results: ie one can construct an example where a == b, and b == c, but a != c.
To Reproduce
from lark import Token
a = Token("noun", "flies")
b = "flies"
c = Token("verb", "flies")
print(f"a == b ? {a == b}")
print(f"b == c ? {b == c}")
print(f"a == c ? {a == c}")
output
a == b ? True
b == c ? True
a == c ? False
This report is prompted by https://github.com/python-poetry/poetry-core/pull/286, in which one lark grammar is reused by another. So depending on which was used you could get the "same" thing with type either MARKER_NAME or markers__MARKER_NAME - which caused some confusion.
In an ideal world I think I'd argue that Token should not be a subclass of str.
I suspect this is too deeply embedded / relied upon, and that you're not going to want to do anything about this. That's fine by me, please just close this out. But I thought I'd open the issue anyway so that you knew this of this oddity and could take that decision deliberately.
Thanks for floating this. I can see why this behavior would be an issue. Tokens behave so much like strings that it makes sense to assume they'll act like strings in every way.
In an ideal world I think I'd argue that Token should not be a subclass of str.
Maybe you're right. Or maybe I should have just made __eq__ compare only the value and ignore the type.
Changing the subclassing isn't really an option. But, maybe changing __eq__ is, if it makes sense to do.
Funny enough, the Lark-Cython implementation's only break in compatibility is that the Token class doesn't inherit from str.
Anyway, I appreciate that you took the time to let me know. Maybe nothing will come of it, but we can keep it open and see what other people think.
P.S. just realized that even if I fix __eq__ there's still room for "oddities" like token == str but repr(token) != repr(str). But maybe that's more acceptable.
I see two ways to restore transitivity of equality:
- remove the overriding
__eq__implementation, so that all ofa,b, andcare considered equal - update it so that no
Tokenis ever equal to any string, it can only be equal to anotherTokenwith the same type and value. Then none ofa,bandcare considered equal.
I don't have a good sense of how breaking either of these would be for lark and its users and I don't actually care about this very much myself. So I'm going to drop out and leave you to do with this what you will, thanks!
Thanks again for your input!
My intuition says that a subclass that divides the value space shouldn't be equal to its superclass. It's something I added out of convenience and didn't think much about. It might be a little too late to change it, at least for v1.x, but it's definitely something worth thinking more about.