deepdiff icon indicating copy to clipboard operation
deepdiff copied to clipboard

Cannot use `np.dtype='bool'` at all?

Open doronbehar opened this issue 1 year ago • 4 comments

Describe the bug

python -c "
from deepdiff import DeepHash 
import numpy as np
d = {'p': np.array([True], dtype='bool')}
print(DeepHash(d)[d])
"

Gives me:

8.0.0
Traceback (most recent call last):
  File "<string>", line 6, in <module>
  File "/nix/store/cv8dkyqwqbsdrjy1ji2nvxamvvy8ivsa-python3-3.12.6-env/lib/python3.12/site-packages/deepdiff/deephash.py", line 211, in __init__
    self._hash(obj, parent=parent, parents_ids=frozenset({get_id(obj)}))
  File "/nix/store/cv8dkyqwqbsdrjy1ji2nvxamvvy8ivsa-python3-3.12.6-env/lib/python3.12/site-packages/deepdiff/deephash.py", line 537, in _hash
    result, counts = self._prep_dict(obj=obj, parent=parent, parents_ids=parents_ids)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/nix/store/cv8dkyqwqbsdrjy1ji2nvxamvvy8ivsa-python3-3.12.6-env/lib/python3.12/site-packages/deepdiff/deephash.py", line 401, in _prep_dict
    hashed, count = self._hash(item, parent=key_in_report, parents_ids=parents_ids_added)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/nix/store/cv8dkyqwqbsdrjy1ji2nvxamvvy8ivsa-python3-3.12.6-env/lib/python3.12/site-packages/deepdiff/deephash.py", line 556, in _hash
    result, counts = self._prep_iterable(obj=obj, parent=parent, parents_ids=parents_ids)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/nix/store/cv8dkyqwqbsdrjy1ji2nvxamvvy8ivsa-python3-3.12.6-env/lib/python3.12/site-packages/deepdiff/deephash.py", line 434, in _prep_iterable
    hashed, count = self._hash(item, parent=new_parent, parents_ids=parents_ids_added)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/nix/store/cv8dkyqwqbsdrjy1ji2nvxamvvy8ivsa-python3-3.12.6-env/lib/python3.12/site-packages/deepdiff/deephash.py", line 561, in _hash
    result, counts = self._prep_obj(obj=obj, parent=parent, parents_ids=parents_ids)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/nix/store/cv8dkyqwqbsdrjy1ji2nvxamvvy8ivsa-python3-3.12.6-env/lib/python3.12/site-packages/deepdiff/deephash.py", line 355, in _prep_obj
    result, counts = self._prep_dict(obj, parent=parent, parents_ids=parents_ids,
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/nix/store/cv8dkyqwqbsdrjy1ji2nvxamvvy8ivsa-python3-3.12.6-env/lib/python3.12/site-packages/deepdiff/deephash.py", line 401, in _prep_dict
    hashed, count = self._hash(item, parent=key_in_report, parents_ids=parents_ids_added)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/nix/store/cv8dkyqwqbsdrjy1ji2nvxamvvy8ivsa-python3-3.12.6-env/lib/python3.12/site-packages/deepdiff/deephash.py", line 503, in _hash
    result, counts = self.hashes[obj]
                     ~~~~~~~~~~~^^^^^
ValueError: memoryview: hashing is restricted to formats 'B', 'b' or 'c'

Why? Isn't a boolean datatype supposed to be the simplest dtype there is?

To Reproduce

Above.

Expected behavior

No error.

OS, DeepDiff version and Python version (please complete the following information):

  • OS: NixOS
  • Version nixos-unstable
  • Python Version 3.11 & 3.12
  • DeepDiff Version 8.0.0

doronbehar avatar Oct 14 '24 17:10 doronbehar

@doronbehar Thanks for reporting the bug. It is not supported because nobody until now has run into this issue and reported it. Which means boolean dtype is not very popular even if it is the simplest. Do you think you may have time to make a PR for it? PRs are always very welcome!

seperman avatar Oct 14 '24 21:10 seperman

OK I see, I thought that deepdiff decided by itself due to an unclear reason to restrict hashing to formats 'B', 'b' and 'c' :), that's why I phrased my question like that.

And yes, I won't mind giving this a bit of effort. However I have no idea where that memoryview comes from.. I can create a PR that will simply skip memory view obj variables, but I'm not sure whether that is the correct thing to do. Here's what I did in the meantime:

diff --git i/deepdiff/deephash.py w/deepdiff/deephash.py
index 32fee9c..1258713 100644
--- i/deepdiff/deephash.py
+++ w/deepdiff/deephash.py
@@ -500,6 +500,8 @@ class DeepHash(Base):
         else:
             result = not_hashed
         try:
+            print("obj is", obj)
+            print("hashes are", self.hashes)
             result, counts = self.hashes[obj]
         except (TypeError, KeyError):
             pass

And ran the same reproducing snippet, and got:

obj is {'p': array([ True])}
hashes are {<object object at 0x7ff75fb7a890>: []}
obj is p
hashes are {<object object at 0x7ff75fb7a890>: []}
obj is [ True]
hashes are {<object object at 0x7ff75fb7a890>: [], 'p': ('682328452ad5d85d3e4ab905b5337a443d43adfeefb3c89d95b477f92f7fe96e', 1)}
obj is True
hashes are {<object object at 0x7ff75fb7a890>: [], 'p': ('682328452ad5d85d3e4ab905b5337a443d43adfeefb3c89d95b477f92f7fe96e', 1)}
obj is T
hashes are {<object object at 0x7ff75fb7a890>: [], 'p': ('682328452ad5d85d3e4ab905b5337a443d43adfeefb3c89d95b477f92f7fe96e', 1)}
obj is base
hashes are {<object object at 0x7ff75fb7a890>: [], 'p': ('682328452ad5d85d3e4ab905b5337a443d43adfeefb3c89d95b477f92f7fe96e', 1), 'T': ('1fc05e26a2f596e4108cb887c23b73551ce8faba3d7f6fd07b468d0df826b8f3', 1)}
obj is None
hashes are {<object object at 0x7ff75fb7a890>: [], 'p': ('682328452ad5d85d3e4ab905b5337a443d43adfeefb3c89d95b477f92f7fe96e', 1), 'T': ('1fc05e26a2f596e4108cb887c23b73551ce8faba3d7f6fd07b468d0df826b8f3', 1), 'base': ('d4752d47041b2df8f562b307b48709b90fcfc8ee56dd5a1df2a8d2fe2427f27e', 1)}
obj is data
hashes are {<object object at 0x7ff75fb7a890>: [], 'p': ('682328452ad5d85d3e4ab905b5337a443d43adfeefb3c89d95b477f92f7fe96e', 1), 'T': ('1fc05e26a2f596e4108cb887c23b73551ce8faba3d7f6fd07b468d0df826b8f3', 1), 'base': ('d4752d47041b2df8f562b307b48709b90fcfc8ee56dd5a1df2a8d2fe2427f27e', 1), None: ('bbd393a60007e5f9621b8fde442dbcf493227ef7ced9708aa743b46a88e1b49e', 1)}
obj is <memory at 0x7ff71c284540>
hashes are {<object object at 0x7ff75fb7a890>: [], 'p': ('682328452ad5d85d3e4ab905b5337a443d43adfeefb3c89d95b477f92f7fe96e', 1), 'T': ('1fc05e26a2f596e4108cb887c23b73551ce8faba3d7f6fd07b468d0df826b8f3', 1), 'base': ('d4752d47041b2df8f562b307b48709b90fcfc8ee56dd5a1df2a8d2fe2427f27e', 1), None: ('bbd393a60007e5f9621b8fde442dbcf493227ef7ced9708aa743b46a88e1b49e', 1), 'data': ('2f8c213f30eab7fcc3c2f9c88010ebad400be515f7c9f746ca13efcb1fb7ed75', 1)}

doronbehar avatar Oct 15 '24 09:10 doronbehar

@doronbehar Does this look relevant? https://stackoverflow.com/a/38837737/1497443 It seems for hashing the boolean dtype, we should do hash(bytes(image1)))

seperman avatar Oct 15 '24 17:10 seperman

It came out to be simpler then I thought :) solution is in https://github.com/seperman/deepdiff/pull/496

doronbehar avatar Oct 19 '24 23:10 doronbehar

Awesome work!

McTonderski avatar Nov 13 '24 21:11 McTonderski