attrs icon indicating copy to clipboard operation
attrs copied to clipboard

Inheriting from a frozen class causes instantiated child hashes to collide

Open ejohb opened this issue 6 years ago • 6 comments

This works as expected:

class A:
    pass

class B(A):
    pass

class C(A):
    pass

assert hash(B) != hash(C)
assert hash(B()) != hash(C())

=========================== 1 passed in 3.39 seconds ===========================

This errors:

import attr

@attr.s(frozen=True)
class A:
    pass

class B(A):
    pass

class C(A):
    pass

assert hash(B) != hash(C)
assert hash(B()) != hash(C())

>       assert hash(B()) != hash(C())
E       AssertionError: assert 7618298384354620882 != 7618298384354620882
E        +  where 7618298384354620882 = hash(B())
E        +    where B() = <class 'test_attr_bug.<locals>.B'>()
E        +  and   7618298384354620882 = hash(C())
E        +    where C() = <class 'test_attr_bug.<locals>.C'>()

So these two instantiated classes hash to the same value despite a) being instances of different classes and b) being of classes which, when uninstantiated, have differing hashes.

This seems very wrong to me!

ejohb avatar May 03 '19 13:05 ejohb

@EdwardJB : The behavior seems more general than this.

import attr
@attr.s(hash=True)
class A:
    pass

@attr.s(hash=True)
class B:
    pass

hash(A) == hash(B)
Out[5]: False

hash(A()) == hash(B())
Out[6]: True

If you look at how the hash is computed, the type appears not to be used anywhere (despite the name type_hash being used).

While the behavior might seem a little odd, it isn't wrong (since there are no hard guarantees about hash code collisions even for objects of the same type) and it is difficult for me to imagine a case where it has significant performance implications.

(to compare to a similar library, sampling some generated code from Java's Immutables indicates they do not incorporate the class into the hash code either)

gabbard avatar May 07 '19 15:05 gabbard

It seems good to at least rename type_hash to a less misleading name in the code, though.

gabbard avatar May 07 '19 15:05 gabbard

And the footnote on http://www.attrs.org/en/stable/hashing.html#id1 which says "The hash is computed by hashing a tuple that consists of an unique id for the class plus all attribute values" doesn't match the actual behavior.

gabbard avatar May 07 '19 15:05 gabbard

And the footnote on http://www.attrs.org/en/stable/hashing.html#id1 which says "The hash is computed by hashing a tuple that consists of an unique id for the class plus all attribute values" doesn't match the actual behavior.

That's what I was relying on. Can you guys add an include_type_in_hash argument?

ejohb avatar May 07 '19 16:05 ejohb

@EdwardJB : well, that's up to @hynek , etc. I'm just a random person. :-) . But I'm a little curious what your use case is where this matters? The only impact would be increasing the collision rate for collections which contain lots of objects which have identical content except for their type. I'm having a hard time imagining a circumstance which would require this?

gabbard avatar May 07 '19 16:05 gabbard

Multiple things:

  • It is indeed OK for hashes to collide, see my blog post on that topic for details. That makes this kind of a feature request, not a bug. But I'm not gonna discuss semantics here. :)
  • I was thinking on removing the method caching anyways because it seems a fringe problem (saves some ram and setuptime when you create lots of identical but different classes) that introduces unnecessary complexity. This might be another instance where that might come handy.
  • If someone can find a way to remove the method caching and add a cached type hash instead that is mixed into the hash call, I would merge it (barring any unforseen problems).

hynek avatar May 08 '19 18:05 hynek