deepdiff
deepdiff copied to clipboard
`DeepHash`: Different dataframes get the same hash
Describe the bug Hash collision seems to happen whenever two dataframes have the same column names, regardless of the rows.
To Reproduce
from deepdiff import DeepHash
x = pd.DataFrame({'a': [1, 2, 3]})
y = pd.DataFrame({'a': [1, 2, 3, 4]})
a = DeepHash(x)[x]
b = DeepHash(y)[y]
assert a == b
Expected behavior Collisions should be harder to find than this (unless this was designed into the library?)
OS, DeepDiff version and Python version (please complete the following information):
- OS: Ubuntu 22.04.2 LTS
- Python Version: 3.10.8
- DeepDiff Version: 6.3.0
Hi @amakelov While we have been supporting Numpy for many years, Pandas data-frames have never been covered. If you have time to add the Pandas support to DeepDiff, that would be great. Otherwise I will look into it when I have a chance.