pyrsistent icon indicating copy to clipboard operation
pyrsistent copied to clipboard

PMap.__getitem__ gets the wrong value

Open brunonicko opened this issue 5 years ago • 5 comments
trafficstars

PMap.__getitem__ will return the wrong value if the object used for the key defines a unique __hash__ and a custom __eq__ method.

Repro (tested with python 3.8 and pyrsistent 0.17.3):

from pyrsistent import pmap

class CustomObject(object):
    __slots__ = ()

    def __hash__(self):
        return object.__hash__(self)

    def __eq__(self, other):  # Having __eq__ declared causes the failure
        return True  # Changing this to False will fail with a different error

keys = []
py_dict = {}
for i in range(10000):
    key = CustomObject()
    keys.append(key)
    py_dict[key] = i

p_map = pmap(py_dict)
assert len(py_dict) == len(p_map)
assert set(py_dict.keys()) == set(p_map.keys())

for i, key in enumerate(keys):
    assert py_dict[key] == i
    assert p_map[key] == i  # Fails here when `i` is a big number, __getitem__ gets the wrong value

brunonicko avatar Sep 29 '20 21:09 brunonicko

Sorry, I didn't introduce myself. My name is Bruno and I work with python development for visual effects pipelines. I really like pyrsistent and I think it's an awesome tool for developing modern Python applications. I came across this bug recently, and I am trying to figure out why it's behaving that way. I also noticed this is not a problem when running Python 2.7 and pyrsistent 0.16.1, so maybe it's because of some behaviour introduced in newer python versions that I'm not aware of. I'll keep investigating to see if I can find more information about what's going on. Cheers.

brunonicko avatar Sep 30 '20 04:09 brunonicko

Thinking more about it, I guess my example breaks the rules of how hashing and equality should work in good python code in the first place. So even though pmap seems to behave differently than a regular dict in this case, maybe it doesn't matter since this particular example is bad and should not be implemented that way anyway. Feel free to just reject the request, and sorry for the confusion :)

brunonicko avatar Sep 30 '20 06:09 brunonicko

Hi Bruno!

OK, no worries. Having two object that compare equal but result in different hashes seems odd to me. I'm a bit surprised about the difference between Python 2 and 3 though. I need to do some investigation. Hope to find some time in the next days.

tobgu avatar Sep 30 '20 06:09 tobgu

This difference comes from an implementation detail in the python dict c-code. Infact, the code in the example for the ordinary dict only passes because the keys are enumerated in the order they were added to the dict. If you include the lines import random; random.shuffle(keys) before the for loop, the assert will (usually) fail.

As this behaviour is not in the spec for dict, it may change in the future.

alchayward avatar Jan 05 '21 14:01 alchayward

just a suggestion, but eq returning True with hashcode returning nonequal values could throw an exception at runtime? to me that would feel the right thing.

pdivos avatar Feb 21 '21 17:02 pdivos