deepdiff icon indicating copy to clipboard operation
deepdiff copied to clipboard

Diff, Hash and Search cause 100% CPU lockup for obj that contains IPv4Interface, IPv6Interface, IPv4Network, IPv6Network.

Open MarcelBastiaans opened this issue 1 year ago • 1 comments

Performing a DeepDiff, DeepHash or DeepSearch on a python object that contains one of "IPv4Interface, IPv6Interface, IPv4Network, IPv6Network" from the ipaddress package will utilitize 100% CPU until the entire IP range has been iterated. This can take a VERY long time for IPv6 ranges.

Describe the bug The library does not explicitly support the ipaddress data-types of IPv4Interface, IPv6Interface, IPv4Network, or IPv6Network. This causes the code to finally check if the field is iterable and then proceeds to process the field as an iterable type. All of these types are iterable but should actually be treated as a string for comparison purposes. The code below demonstrates the problem.

To Reproduce ===================== BEGIN CODE ======================= """Program to demonstrate deepdiff infinite iterate over IPv6Interface""" import ipaddress from typing import Union from deepdiff import DeepDiff, DeepHash

faulty_types = Union[ipaddress.IPv4Network, ipaddress.IPv6Network, ipaddress.IPv4Interface, ipaddress.IPv6Interface]

class Class1: """Class containing single data member to demonstrate deepdiff infinite iterate over IPv6Interface"""

def __init__(self, addr: str):
    self.field: faulty_types = ipaddress.IPv6Network(addr)

def main(): """Test function to demonstrate deepdiff infinite iterate over IPv6Interface""" obj1 = Class1("2002:db8::/30") print(f'OBJ1:{obj1}\n') obj1_hash = DeepHash(obj1) print(f'OBJ1_HASH: {obj1_hash}\n') obj2 = Class1("2001:db8::/32") print(f'OBJ2:{obj2}\n') obj2_hash = DeepHash(obj2) print(f'OBJ2_HASH: {obj2_hash}\n') diff = DeepDiff(obj1, obj2) print(f'DIFF: {diff}\n')

if name == "main": main() ====================== END CODE =======================

Expected behavior A clear and concise description of what you expected to happen.

OS, DeepDiff version and Python version (please complete the following information):

  • OS: [e.g. Ubuntu/Windows]
  • Version [e.g. 20LTS]
  • Python Version [e.g. 3.9.11]
  • DeepDiff Version [e.g. 8.0.1]

Additional context I have tested the following changes (included as a patch to 8.0.1) and verified that it resolves the issue for me. I did not try to make the diff results serializable but my testing showed that objects that contain a datetime field in the diff are also not serializable.

0001-ipranges.patch

MarcelBastiaans avatar Sep 18 '24 12:09 MarcelBastiaans

PS. I found that adding the following to the end of the JSON_CONVERTER in serialisation.py (line 598) makes the to_json() work:

iprange: lambda x: str(x)

MarcelBastiaans avatar Sep 18 '24 12:09 MarcelBastiaans

In the case IP network interfaces, maybe they should not be treated as collections by DeepDiff because they don't even provide the len method. A similar problem occurs for other iterables such as ranges, with which DeepDiff could provide huge lists of differences for long ranges. So I suggest ranges to also be converted to str.

Another option would be delegating to the user the decision on how to treat iterables. It would be nice having a DeepDiff argument to provide a callback function with t1's & t2's object, so that the user can return False when DeepDiff user code needs to perform custom processing. Example:

def custom(o1, o2):
    if type(o1) == type(o2) == range:
       ...custom treatment
       return False
    return True

dd = DeepDiff([range(1000)], [range(10)], custom=custom)

dd would not report any difference because custom returned False for this pair. This could be implemented somehow with exclude_obj_callback_strict parameter, but it would be more involved because the user has no immediate access to both ranges to perform its custom processing.

dpinol avatar Feb 23 '25 21:02 dpinol

@MarcelBastiaans Thanks for reporting the issue. I applied your patch and will release it as a part of DeepDiff 8.4.2 @dpinol Can you elaborate why what DeepDiff is currently doing is not right? Here is an example test:

def test_range1(self):
    range1 = range(0, 10)
    range2 = range(0, 8)
    diff = DeepDiff(range1, range2)
    assert {'iterable_item_removed': {'root[8]': 8, 'root[9]': 9}} == diff

seperman avatar Mar 17 '25 21:03 seperman

@dpinol Can you elaborate why what DeepDiff is currently doing is not right? Here is an example test:

def test_range1(self):
    range1 = range(0, 10)
    range2 = range(0, 8)
    diff = DeepDiff(range1, range2)
    assert {'iterable_item_removed': {'root[8]': 8, 'root[9]': 9}} == diff

It's not that it's wrong, the problem is that the output is too verbose compared with a plain assert r1 == r2 when the ranges are large.

assert range(0, 1000) == range(0,1)
range(0, 1) != range(0, 1000)

Expected :range(0, 1000)
Actual   :range(0, 1)

I understand that when the comparison output is consumed by an algorithm, DeepDiff still does "the best work", but it is too verbose eg. when using DeepDiff from asserts in unit tests. That's why I suggested adding an argument to allow custom comparison of selected types. It would also be useful for developers as a backdoor for them to solve other problematic types, such as pathlib.Path.

dpinol avatar Mar 18 '25 09:03 dpinol

@dpinol Have you looked at the custom operators? I'm closing this ticket since the original issue raised in this ticket is addressed.

seperman avatar Mar 18 '25 13:03 seperman

@dpinol Have you looked at the custom operators? I'm closing this ticket since the original issue raised in this ticket is addressed.

that's what I was looking for, thanks!

dpinol avatar Mar 18 '25 13:03 dpinol