deepdiff
deepdiff copied to clipboard
Cannot use `np.dtype='bool'` at all?
Describe the bug
python -c "
from deepdiff import DeepHash
import numpy as np
d = {'p': np.array([True], dtype='bool')}
print(DeepHash(d)[d])
"
Gives me:
8.0.0
Traceback (most recent call last):
File "<string>", line 6, in <module>
File "/nix/store/cv8dkyqwqbsdrjy1ji2nvxamvvy8ivsa-python3-3.12.6-env/lib/python3.12/site-packages/deepdiff/deephash.py", line 211, in __init__
self._hash(obj, parent=parent, parents_ids=frozenset({get_id(obj)}))
File "/nix/store/cv8dkyqwqbsdrjy1ji2nvxamvvy8ivsa-python3-3.12.6-env/lib/python3.12/site-packages/deepdiff/deephash.py", line 537, in _hash
result, counts = self._prep_dict(obj=obj, parent=parent, parents_ids=parents_ids)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/nix/store/cv8dkyqwqbsdrjy1ji2nvxamvvy8ivsa-python3-3.12.6-env/lib/python3.12/site-packages/deepdiff/deephash.py", line 401, in _prep_dict
hashed, count = self._hash(item, parent=key_in_report, parents_ids=parents_ids_added)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/nix/store/cv8dkyqwqbsdrjy1ji2nvxamvvy8ivsa-python3-3.12.6-env/lib/python3.12/site-packages/deepdiff/deephash.py", line 556, in _hash
result, counts = self._prep_iterable(obj=obj, parent=parent, parents_ids=parents_ids)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/nix/store/cv8dkyqwqbsdrjy1ji2nvxamvvy8ivsa-python3-3.12.6-env/lib/python3.12/site-packages/deepdiff/deephash.py", line 434, in _prep_iterable
hashed, count = self._hash(item, parent=new_parent, parents_ids=parents_ids_added)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/nix/store/cv8dkyqwqbsdrjy1ji2nvxamvvy8ivsa-python3-3.12.6-env/lib/python3.12/site-packages/deepdiff/deephash.py", line 561, in _hash
result, counts = self._prep_obj(obj=obj, parent=parent, parents_ids=parents_ids)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/nix/store/cv8dkyqwqbsdrjy1ji2nvxamvvy8ivsa-python3-3.12.6-env/lib/python3.12/site-packages/deepdiff/deephash.py", line 355, in _prep_obj
result, counts = self._prep_dict(obj, parent=parent, parents_ids=parents_ids,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/nix/store/cv8dkyqwqbsdrjy1ji2nvxamvvy8ivsa-python3-3.12.6-env/lib/python3.12/site-packages/deepdiff/deephash.py", line 401, in _prep_dict
hashed, count = self._hash(item, parent=key_in_report, parents_ids=parents_ids_added)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/nix/store/cv8dkyqwqbsdrjy1ji2nvxamvvy8ivsa-python3-3.12.6-env/lib/python3.12/site-packages/deepdiff/deephash.py", line 503, in _hash
result, counts = self.hashes[obj]
~~~~~~~~~~~^^^^^
ValueError: memoryview: hashing is restricted to formats 'B', 'b' or 'c'
Why? Isn't a boolean datatype supposed to be the simplest dtype there is?
To Reproduce
Above.
Expected behavior
No error.
OS, DeepDiff version and Python version (please complete the following information):
- OS: NixOS
- Version
nixos-unstable - Python Version 3.11 & 3.12
- DeepDiff Version 8.0.0
@doronbehar Thanks for reporting the bug. It is not supported because nobody until now has run into this issue and reported it. Which means boolean dtype is not very popular even if it is the simplest. Do you think you may have time to make a PR for it? PRs are always very welcome!
OK I see, I thought that deepdiff decided by itself due to an unclear reason to restrict hashing to formats 'B', 'b' and 'c' :), that's why I phrased my question like that.
And yes, I won't mind giving this a bit of effort. However I have no idea where that memoryview comes from.. I can create a PR that will simply skip memory view obj variables, but I'm not sure whether that is the correct thing to do. Here's what I did in the meantime:
diff --git i/deepdiff/deephash.py w/deepdiff/deephash.py
index 32fee9c..1258713 100644
--- i/deepdiff/deephash.py
+++ w/deepdiff/deephash.py
@@ -500,6 +500,8 @@ class DeepHash(Base):
else:
result = not_hashed
try:
+ print("obj is", obj)
+ print("hashes are", self.hashes)
result, counts = self.hashes[obj]
except (TypeError, KeyError):
pass
And ran the same reproducing snippet, and got:
obj is {'p': array([ True])}
hashes are {<object object at 0x7ff75fb7a890>: []}
obj is p
hashes are {<object object at 0x7ff75fb7a890>: []}
obj is [ True]
hashes are {<object object at 0x7ff75fb7a890>: [], 'p': ('682328452ad5d85d3e4ab905b5337a443d43adfeefb3c89d95b477f92f7fe96e', 1)}
obj is True
hashes are {<object object at 0x7ff75fb7a890>: [], 'p': ('682328452ad5d85d3e4ab905b5337a443d43adfeefb3c89d95b477f92f7fe96e', 1)}
obj is T
hashes are {<object object at 0x7ff75fb7a890>: [], 'p': ('682328452ad5d85d3e4ab905b5337a443d43adfeefb3c89d95b477f92f7fe96e', 1)}
obj is base
hashes are {<object object at 0x7ff75fb7a890>: [], 'p': ('682328452ad5d85d3e4ab905b5337a443d43adfeefb3c89d95b477f92f7fe96e', 1), 'T': ('1fc05e26a2f596e4108cb887c23b73551ce8faba3d7f6fd07b468d0df826b8f3', 1)}
obj is None
hashes are {<object object at 0x7ff75fb7a890>: [], 'p': ('682328452ad5d85d3e4ab905b5337a443d43adfeefb3c89d95b477f92f7fe96e', 1), 'T': ('1fc05e26a2f596e4108cb887c23b73551ce8faba3d7f6fd07b468d0df826b8f3', 1), 'base': ('d4752d47041b2df8f562b307b48709b90fcfc8ee56dd5a1df2a8d2fe2427f27e', 1)}
obj is data
hashes are {<object object at 0x7ff75fb7a890>: [], 'p': ('682328452ad5d85d3e4ab905b5337a443d43adfeefb3c89d95b477f92f7fe96e', 1), 'T': ('1fc05e26a2f596e4108cb887c23b73551ce8faba3d7f6fd07b468d0df826b8f3', 1), 'base': ('d4752d47041b2df8f562b307b48709b90fcfc8ee56dd5a1df2a8d2fe2427f27e', 1), None: ('bbd393a60007e5f9621b8fde442dbcf493227ef7ced9708aa743b46a88e1b49e', 1)}
obj is <memory at 0x7ff71c284540>
hashes are {<object object at 0x7ff75fb7a890>: [], 'p': ('682328452ad5d85d3e4ab905b5337a443d43adfeefb3c89d95b477f92f7fe96e', 1), 'T': ('1fc05e26a2f596e4108cb887c23b73551ce8faba3d7f6fd07b468d0df826b8f3', 1), 'base': ('d4752d47041b2df8f562b307b48709b90fcfc8ee56dd5a1df2a8d2fe2427f27e', 1), None: ('bbd393a60007e5f9621b8fde442dbcf493227ef7ced9708aa743b46a88e1b49e', 1), 'data': ('2f8c213f30eab7fcc3c2f9c88010ebad400be515f7c9f746ca13efcb1fb7ed75', 1)}
@doronbehar Does this look relevant? https://stackoverflow.com/a/38837737/1497443
It seems for hashing the boolean dtype, we should do hash(bytes(image1)))
It came out to be simpler then I thought :) solution is in https://github.com/seperman/deepdiff/pull/496
Awesome work!