deepdiff icon indicating copy to clipboard operation
deepdiff copied to clipboard

`DeepHash`: Different dataframes get the same hash

Open amakelov opened this issue 1 year ago • 1 comments

Describe the bug Hash collision seems to happen whenever two dataframes have the same column names, regardless of the rows.

To Reproduce

from deepdiff import DeepHash
x = pd.DataFrame({'a': [1, 2, 3]})
y = pd.DataFrame({'a': [1, 2, 3, 4]})
a = DeepHash(x)[x]
b = DeepHash(y)[y]
assert a == b

Expected behavior Collisions should be harder to find than this (unless this was designed into the library?)

OS, DeepDiff version and Python version (please complete the following information):

  • OS: Ubuntu 22.04.2 LTS
  • Python Version: 3.10.8
  • DeepDiff Version: 6.3.0

amakelov avatar Apr 29 '23 22:04 amakelov

Hi @amakelov While we have been supporting Numpy for many years, Pandas data-frames have never been covered. If you have time to add the Pandas support to DeepDiff, that would be great. Otherwise I will look into it when I have a chance.

seperman avatar May 01 '23 05:05 seperman