datacompy icon indicating copy to clipboard operation
datacompy copied to clipboard

Pandas and Spark DataFrame comparison for humans and more!

Results 30 datacompy issues
Sort by recently updated
recently updated
newest added

Summary report shows there are records in base and compare dataset that did not have corresponding matches > ****** Row Summary ****** > Number of rows in common: 2971140 >...

bug
question

Vulnerable Library - numpy-1.21.6-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl NumPy is the fundamental package for array computing with Python. Library home page: https://files.pythonhosted.org/packages/6d/ad/ff3b21ebfe79a4d25b4a4f8e5cf9fd44a204adb6b33c09010f566f51027a/numpy-1.21.6-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl Path to dependency file: /tmp/ws-scm/datacompy Path to vulnerable library: /tmp/ws-scm/datacompy ## Vulnerabilities...

security vulnerability

@dan-coates @theianrobertson I've taken a stab at this long outstanding issue #13. I just made the decision to follow the advice from the original discussion and make a `ABC` based...

If there are nulls in the join columns of dataframes, they do not pass the join condition and so are not counted as common rows, and instead counted as unique...

enhancement

While using datacompy.compare a string/object was misinterpreted as float (because string has only digits). After all the strings have got length 35 and are only different in the last digit....

bug
enhancement

Comparison of two empty `numpy` arrays currently return `False`, which results in showing diffs where there shouldn't be. This is due to the way `numpy` compares empty arrays. Running `bool(np.array([])...

bug
help wanted
question

Not sure if it makes sense to go all the way to subclassing or ABCs, but the API calls between `Compare` and SparkCompare` are quite different. I think they could...

enhancement
help wanted
spark

This is in the API reference `/api/core.html#datacompy.core.Compare.report` but should be more explicit in the usage docs too.

enhancement
help wanted
docs

Currently the csv that's being written out has all the columns written out on a failed comparison making it harder to read with a larger data-set. This is more of...

enhancement
help wanted
good first issue