deepdiff
deepdiff copied to clipboard
`iterable_compare_func` does not seem to work on nested lists
Please checkout the F.A.Q page before creating a bug ticket to make sure it is not already addressed.
Describe the bug I cannot get the compare function to work properly on lists that are not 1st-level.
To Reproduce Define the two following objects. They are identical except that:
- the
versionattribute went0.0.0->0.0.1 - in the second object has a number field removed in the
numberfield_setlist, and a new one added.UUID4s are used to track which one should be correlated with which, much likeids. in the example in the docs.
self_json= b'{"stringfield_set":[],"numberfield_set":[{"uuid":"fa0c87e8-01f5-43f5-8e63-24886f72ffd0","name":"field_1","address":1,"documentation":"","first_bit_offset":2,"size_in_bits":3,"is_signed":false,"is_lsb_left":false,"value_multiply_by":1.0,"value_divide_by":1.0,"value_increase_by":0.0,"value_unit":""},{"uuid":"332e5cfe-886d-41cc-b4d9-0fc1296b3ea0","name":"field_2","address":5,"documentation":"","first_bit_offset":5,"size_in_bits":6,"is_signed":false,"is_lsb_left":false,"value_multiply_by":1.0,"value_divide_by":1.0,"value_increase_by":0.0,"value_unit":""}],"enumfield_set":[],"version":"0.0.0","version_date":null,"version_commit":null,"ros_link":null,"documentation":"","definition":42,"information":[],"users":[]}'
other_json=b'{"stringfield_set":[],"numberfield_set":[{"uuid":"fa0c87e8-01f5-43f5-8e63-24886f72ffd0","name":"field_1","address":1,"documentation":"","first_bit_offset":2,"size_in_bits":3,"is_signed":false,"is_lsb_left":false,"value_multiply_by":1.0,"value_divide_by":1.0,"value_increase_by":0.0,"value_unit":""},{"uuid":"056429c5-812b-4f49-9aae-1b52bd40aacd","name":"field_3","address":7,"documentation":"","first_bit_offset":8,"size_in_bits":9,"is_signed":false,"is_lsb_left":false,"value_multiply_by":1.0,"value_divide_by":1.0,"value_increase_by":0.0,"value_unit":""}],"enumfield_set":[],"version":"0.0.1","version_date":null,"version_commit":null,"ros_link":null,"documentation":"","definition":42,"information":[],"users":[]}'
make the iterable compare func
def field_diff_function(x, y, level=None):
try:
return x["uuid"] == y["uuid"]
except Exception:
raise CannotCompare() from None
run the compare:
diff_native = DeepDiff(
json.loads(self_json),
json.loads(other_json),
iterable_compare_func=field_diff_function,
ignore_order=True,
)
the following diff is the result:
{
"values_changed": {
"root['numberfield_set'][1]['uuid']": {
"new_value": "056429c5-812b-4f49-9aae-1b52bd40aacd",
"old_value": "332e5cfe-886d-41cc-b4d9-0fc1296b3ea0"
},
"root['numberfield_set'][1]['name']": {
"new_value": "field_3",
"old_value": "field_2"
},
"root['numberfield_set'][1]['address']": {
"new_value": 7,
"old_value": 5
},
"root['numberfield_set'][1]['first_bit_offset']": {
"new_value": 8,
"old_value": 5
},
"root['numberfield_set'][1]['size_in_bits']": {
"new_value": 9,
"old_value": 6
},
"root['version']": {
"new_value": "0.0.1",
"old_value": "0.0.0"
}
}
}
Expected behavior
As the fields should be matched using the uuid attribute, it should show that one has been added, and the other has been removed, and not that they changed.
OS, DeepDiff version and Python version (please complete the following information):
- Python Version: 3.9.10
- deepdiff version: 5.8.0
Additional context I'm definitely not ruling out there's a problem somewhere between the chair and the keyboard
Upon further inspection, slapping a print() statement inside the compare function shows that the function is actually never run.
The iterable_compare_func function is not called when you set ignore_order parameter to True from what I know
The
iterable_compare_funcfunction is not called when you setignore_orderparameter toTruefrom what I know
Is this a bug or intended behaviour? It is not mentioned in the docs atleast.
I'm doing diffs on nested dicts with list of dicts which have id keys where iterable_compare_func helps comparing correct dicts to eachother, however the performance is not very good. Setting ignore_order=True helps performance alot but wrong dicts are compared sometimes since iterable_compare_func is not used.
I'm having the same issue and was about to create an issue... I also done some more testing and debugging...
I think there are actually 2 bugs in this:
-
This is exclusive to
ignore_order=True. There's alen > 1check that prevents the fuction from being called. only if there are two or more different items in both lists does the function actually get called. ie. If there's 1 addition and 1 removal it doesn't call the function to check whether they are pairs or not. I'm almost certain this is incorrect behaviour. I edited it locally and got this fixed (you can mimic by just introducing another element in both lists with differences). Instead of gettingkey 'id' changed from a to b, I'm now gettingdict {'id': a} changed to dict {'id': b}ie. instead of considering k, v changes, the entire dicts are considered changes. https://github.com/seperman/deepdiff/blob/8ab1c8dbf19bb87177c10029a518051d6622532a/deepdiff/diff.py#L1094 -
Irrelevent of
ignore_order. The result of the custom compare function is ignored for nested dicts I've tested this on a list of dicts, inside each of these dicts is a key whose value is also a list of dicts. each dict has an id key. The example below setsignore_order=Truebut it works for both cases, just make sure the order is correct.
[
{'id': '1010', 'g': [ {'id': '2020'} ] },
{'id': '73', 'g': [ {'id': '101'}, {'id': '6790'} ] }
]
[
{'id': '73', 'g': [ {'id': '202'}, {'id': '15294'} ] },
{'id': '1012', 'g': [ {'id': '2020'} ] }
]
I use this as compare function. Except the prints, taken straight from the docs.
def compare_func(x, y, level=None):
print(x)
print(y)
print('in')
if not isinstance(x, dict) or not isinstance(y, dict):
print('not dict')
raise CannotCompare
if x['id'] == y['id']:
print('match')
return True
print('not match')
return False
I found out that it gets called 8 times. 4 times comparing the 2 top level dicts to each other. 4 times comparing the 2 nested dicts to each other (the ones in the dict with the same id)
the prints were as expected for both the top level and nested dicts. but the results are different... 'id': '1010' and 'id': '1012' are not pairs and appeared in iterable_item_added and iterable_item_removed accordingly. but nested dicts were paired with eachother and entered in values_changed DESPITE the function returning False for all pairing checks between them...
I did some more debugging and I found out that the dicts are reported as iterable_item_removed and not as values_changed
which led me to think the problem is in the this line
https://github.com/seperman/deepdiff/blob/8ab1c8dbf19bb87177c10029a518051d6622532a/deepdiff/diff.py#L309
up to this point all differences found are under removed or added but once this function gets called it changes to changed for some reason... taking a quick look inside that function there's a suspicious mutual_add_removes_to_become_value_changes() call that makes me believe it doesn't respect the custom comparing function... I took a quick peek inside it and it seems to do exactly that. I think I need a break...
https://github.com/seperman/deepdiff/blob/8ab1c8dbf19bb87177c10029a518051d6622532a/deepdiff/diff.py#L1551
Hi @jvacek @Omar-Abdul-Azeez @havardthom @LizardBlizzard Somehow this ticket was lost among other tickets I was paying attention to. @Omar-Abdul-Azeez Thank you for diving into it already. I am going to take a look soon.
I think I'm running into the same issue. I'm using a custom iterable_compare_func function to match on id.
All the results BUT ONE are correctly reported either as iterable_item_added or iterable_item_removed. The single incorrect one is reported as values_changed with clearly a non matching id.
@seperman and @Omar-Abdul-Azeez have you been able to fix this? Thanks a lot.