deepdiff
deepdiff copied to clipboard
Add an ignore_order to DeepHash
It appears DeepHash ignores the order of the given object by default to compute a combined hash.
# 3 example objects
x = {'a':0, 'b':[1,2,3]} # a baseline example object
y = {'b':[1,2,3],'a':0} # key order swapped
z = {'a':0, 'b':[2,1,3]} # swapped positions in list for key 'b'
# in all examples, the combined hash is the same
print (DeepHash(x)[x]) # '343d77f8a45dac16bc49a7be37c1ee73250ac4311e316862393f3c2552ff5b64'
print (DeepHash(y)[y]) # '343d77f8a45dac16bc49a7be37c1ee73250ac4311e316862393f3c2552ff5b64'
print (DeepHash(z)[z]) # '343d77f8a45dac16bc49a7be37c1ee73250ac4311e316862393f3c2552ff5b64'
It would be incredibly useful to respect order when computing hash signatures of complex data structures, something like:
DeepHash(x, ignore_order=False)[x] == DeepHash(z, ignore_order=False)[z] # Returns False
Allowing dict keys as an exception would also be great to give more flexibility:
DeepHash(x, ignore_order=False, sort_dict=True)[x] == DeepHash(y, ignore_order=False, sort_dict=True)[y] # Returns True
DeepHash(x, ignore_order=False, sort_dict=True)[x] == DeepHash(z, ignore_order=False, sort_dict=True)[z] # Returns False
@Eric-Vignola interesting idea. Currently DeepDiff uses DeepHash to figure out identical objects before it starts digging into the ones that are not identical. Then and only then inside DeepDiff we start ignoring order between these nonidentical objects.
What you are asking also needs a rewrite into how we serialize objects. A non-trivial amount of work needs to be done for that to happen.
https://github.com/seperman/deepdiff/issues/373
This is my issue pointing the same. I've closed it thinking it was silly question 😄