attrs
attrs copied to clipboard
Inheriting from a frozen class causes instantiated child hashes to collide
This works as expected:
class A:
pass
class B(A):
pass
class C(A):
pass
assert hash(B) != hash(C)
assert hash(B()) != hash(C())
=========================== 1 passed in 3.39 seconds ===========================
This errors:
import attr
@attr.s(frozen=True)
class A:
pass
class B(A):
pass
class C(A):
pass
assert hash(B) != hash(C)
assert hash(B()) != hash(C())
> assert hash(B()) != hash(C())
E AssertionError: assert 7618298384354620882 != 7618298384354620882
E + where 7618298384354620882 = hash(B())
E + where B() = <class 'test_attr_bug.<locals>.B'>()
E + and 7618298384354620882 = hash(C())
E + where C() = <class 'test_attr_bug.<locals>.C'>()
So these two instantiated classes hash to the same value despite a) being instances of different classes and b) being of classes which, when uninstantiated, have differing hashes.
This seems very wrong to me!
@EdwardJB : The behavior seems more general than this.
import attr
@attr.s(hash=True)
class A:
pass
@attr.s(hash=True)
class B:
pass
hash(A) == hash(B)
Out[5]: False
hash(A()) == hash(B())
Out[6]: True
If you look at how the hash is computed, the type appears not to be used anywhere (despite the name type_hash being used).
While the behavior might seem a little odd, it isn't wrong (since there are no hard guarantees about hash code collisions even for objects of the same type) and it is difficult for me to imagine a case where it has significant performance implications.
(to compare to a similar library, sampling some generated code from Java's Immutables indicates they do not incorporate the class into the hash code either)
It seems good to at least rename type_hash to a less misleading name in the code, though.
And the footnote on http://www.attrs.org/en/stable/hashing.html#id1 which says "The hash is computed by hashing a tuple that consists of an unique id for the class plus all attribute values" doesn't match the actual behavior.
And the footnote on http://www.attrs.org/en/stable/hashing.html#id1 which says "The hash is computed by hashing a tuple that consists of an unique id for the class plus all attribute values" doesn't match the actual behavior.
That's what I was relying on. Can you guys add an include_type_in_hash argument?
@EdwardJB : well, that's up to @hynek , etc. I'm just a random person. :-) . But I'm a little curious what your use case is where this matters? The only impact would be increasing the collision rate for collections which contain lots of objects which have identical content except for their type. I'm having a hard time imagining a circumstance which would require this?
Multiple things:
- It is indeed OK for hashes to collide, see my blog post on that topic for details. That makes this kind of a feature request, not a bug. But I'm not gonna discuss semantics here. :)
- I was thinking on removing the method caching anyways because it seems a fringe problem (saves some ram and setuptime when you create lots of identical but different classes) that introduces unnecessary complexity. This might be another instance where that might come handy.
- If someone can find a way to remove the method caching and add a cached type hash instead that is mixed into the hash call, I would merge it (barring any unforseen problems).