deepdiff icon indicating copy to clipboard operation
deepdiff copied to clipboard

Add an ignore_order to DeepHash

Open Eric-Vignola opened this issue 3 years ago • 2 comments

It appears DeepHash ignores the order of the given object by default to compute a combined hash.

# 3 example objects
x = {'a':0, 'b':[1,2,3]} # a baseline example object
y = {'b':[1,2,3],'a':0}  # key order swapped 
z = {'a':0, 'b':[2,1,3]} # swapped positions in list for key 'b' 

# in all examples, the combined hash is the same
print (DeepHash(x)[x]) # '343d77f8a45dac16bc49a7be37c1ee73250ac4311e316862393f3c2552ff5b64'
print (DeepHash(y)[y]) # '343d77f8a45dac16bc49a7be37c1ee73250ac4311e316862393f3c2552ff5b64'
print (DeepHash(z)[z]) # '343d77f8a45dac16bc49a7be37c1ee73250ac4311e316862393f3c2552ff5b64'

It would be incredibly useful to respect order when computing hash signatures of complex data structures, something like: DeepHash(x, ignore_order=False)[x] == DeepHash(z, ignore_order=False)[z] # Returns False

Allowing dict keys as an exception would also be great to give more flexibility: DeepHash(x, ignore_order=False, sort_dict=True)[x] == DeepHash(y, ignore_order=False, sort_dict=True)[y] # Returns True DeepHash(x, ignore_order=False, sort_dict=True)[x] == DeepHash(z, ignore_order=False, sort_dict=True)[z] # Returns False

Eric-Vignola avatar Aug 10 '22 17:08 Eric-Vignola

@Eric-Vignola interesting idea. Currently DeepDiff uses DeepHash to figure out identical objects before it starts digging into the ones that are not identical. Then and only then inside DeepDiff we start ignoring order between these nonidentical objects.

What you are asking also needs a rewrite into how we serialize objects. A non-trivial amount of work needs to be done for that to happen.

seperman avatar Aug 14 '22 02:08 seperman

https://github.com/seperman/deepdiff/issues/373

This is my issue pointing the same. I've closed it thinking it was silly question 😄

Okroshiashvili avatar Feb 08 '23 08:02 Okroshiashvili