deepdiff
deepdiff copied to clipboard
Compare two dictionary and output dict with difference only
Hi
Is it possible to compare two dictionary and output a third dictionary with difference only
Yes, you can do this using the DeepDiff class
from deepdiff import DeepDiff
d1 = {'another_key': 200, 'key': 100}
d2 = {'another_key': 450, 'key': 100}
d3 = DeepDiff(d1,d2).get('values_changed',{})
Output
{"root['another_key']": {'new_value': 450,'old_value': 200}}
Any update on this issue?
@emocibob Are you not satisfied with the comment above?
Assuming that by difference you mean, which values inside the two dictionaries are different (or which were updated), what you are asking for is basically a user code that requires just one line of code.
DeepDiff returns with all the changes, which will also include the differences dictionary as the value of values_changed key.
I don't think DeepDiff as a package need to support this (in-fact it already does!)
I'll use an example to be more clear. Atm I can get the diff in the following format:
from deepdiff import DeepDiff
a = {
'x': {
'y': [1, 2, 3, 4]
},
'q': {
'r': 'abc',
't': 0.5,
}
}
b = {
'x': {
'y': [1, 2, 3]
},
'q': {
'r': 'abc',
}
}
diff = DeepDiff(a, b)
print(diff)
# {'dictionary_item_removed': {"root['q']['t']"}, 'iterable_item_removed': {"root['x']['y'][3]": 4}}
print(type(diff))
# <class 'deepdiff.diff.DeepDiff'>
I would like to know if it's possible to get a regular dict. Based on my previous example, maybe something like:
diff = DeepDiff(a, b).to_dict()
print(diff)
# {
# 'x': {
# 'y': [4]
# },
# 'q': {
# 't': 0.5
# }
# }
print(type(diff))
# <class 'dict'>
Hello,
Yes, it is possible to do that. I will add the to_dict since I see too many people having this issue, but technically all you need to do is dict(DeepDiff(a, b))
@seperman I'm confused
using @emocibob example and your answer, I'm still getting the answer below
from deepdiff import DeepDiff
a = {
'x': {
'y': [1, 2, 3, 4]
},
'q': {
'r': 'abc',
't': 0.5,
}
}
b = {
'x': {
'y': [1, 2, 3]
},
'q': {
'r': 'abc',
}
}
diff = dict(DeepDiff(a, b))
print(diff)
output is
{'dictionary_item_removed': {"root['q']['t']"}, 'iterable_item_removed': {"root['x']['y'][3]": 4}}
what @emocibob and I want is
{
'x': {
'y': [4]
},
'q': {
't': 0.5
}
}
Is it possible?
@seperman any update?
@wobeng @emocibob Just saw this ticket. Thanks for posting the examples. I see what you mean now. Currently the format we export the diff does not support showing diffs like that. This can get tricky when diffing objects. If you have a chance to write the transform functionality to generate the output that you need, and make a PR, we can definitely use it.
Here's another example. I'm diff'ing some configuration from an API and a local file. Here's an output
dictionary_item_added
[root['ldp-ba-01233']['roles']['21631321-3e9e-4483-a73c-13213123213']['members']['logs-ge-31321']]
dictionary_item_removed
[root['ldp-ml-21802']['roles']['33122321-908a-31321-8181-13123123123']['members']['logs-xq-133123']]
values_changed
root['ldp-ba-132131']['roles']['3132321-3e9e-4483-a73c-12321321321321']['members']['logs-mf-23321313']['note']
new_value
Name
old_value
Name FirstName
As you can see, some info are mysterious, because they are IDs. Ok, so I wanna loop at those ldp- and logs- level (the uuid between being irrelevant), call a translate_id_into_label function and output an indented output, and only display those. But that's a string. Best I can think about atm it's a regexpr :confused:
@rgarrigue Not sure what you are trying to achieve but have you looked at the "tree view" in DeepDiff? If you want to see what was compared to what, the tree view gives you that: https://deepdiff.readthedocs.io/en/latest/diff.html
@wobeng I am interested in something similar, but I am comparing one "old" dict to a "new" dict. As a result, I transform the deepdiff only to get the changes to values or types found in the new dict.
This would obviously ignore any addition/removal of fields, because it assumes a directionality between the two dicts (the "new" dict is replacing the "old" dict). Effectively, this library is used as a method of Change Data Capture between two systems! If there are changes, then I push only the changes that occur. If not, then I don't update.
Does your use case apply to this sort of logic? If so, I could work on a PR.
@wyattshapiro Is there a chance you ever got anywhere with a PR? I'm trying to do something fairly similar (i.e. create a 'diff' dict that has new keys/values, changed keys (with the value), and then deleted ones represented. Even if only as a starting point what you built might be helpful!
@dwasyl never made the PR, but did complete a CDC using ddiff (assuming a flat dict comparison in my case).
General steps:
- Create ddiff
ddiff = DeepDiff(destination_data, source_data, ignore_order=True) - search the ddiff top level for the change_sections of interest (ie. 'values_changed')
- iterate through that ddiff dict value (where ddiff key='values_changed') and reconstruct a dict so that new_dict[changed_key] = changed_value
- return the new dict
After briefly reviewing the code, it looks like a feature such as this one could be implemented as a type of View and an associated model (Ex. https://github.com/seperman/deepdiff/blob/c78395c3a0ced0cadea569c3df3e64aa50531ad6/deepdiff/model.py#L88). Obviously, this means the diff is still computed as usual and the result would be transformed at the end.
I think what I struggle with the most is that this new view/model would remove any context for what the difference is (ie. addition, removal, type change, value change, etc) and the result dict would effectively hide that context. This forces the package to decide what diffs are appropriate to include/exclude as opposed to letting the user handle it after the fact. In my case, I was only interested in value changes as it was used for ChangeDataCapture and the assumption was that the dicts represent the same core object (no additional fields on either dict). However, @dwasyl seems to be interested in including additions, removals, and changes in the view/model.
@seperman if a PR was constructed to extend this functionality, do you have a preference for what kinds of changes this new view includes? Should the user have the ability to include/exclude specific kinds of changes? If we assume a directionality as implied currently (ie. t1 becomes t2), then would we really want to include removals in this "CDC" view/model?
Hi @wyattshapiro Interesting. The reason I decided originally not to move forward with this feature was that there are so many possibilities in what to include or not as you noted. Everyone's use case is different. Most users could run a transformation on top of the DeepDiff results to get what they want. And any of these "views" will essentially produce a subset of what the "text" view already provides.
I have difficulty imagining how a clean API can be designed that offers all these variants of what people could be looking for.
Hey @seperman No doubt any user can run a transformation themselves, that's what I did for using the Text view model as a result. On the other hand, if there are many users running this specific transformation themselves, then it seems like a sweet feature to extend to all.
What I am imagining at a high level is an API that allows users to select this new view when calling DeepDiff(). Then, there would be one additional argument that is a list of strings, which represent 'keys' (as in these keys https://github.com/seperman/deepdiff/blob/c78395c3a0ced0cadea569c3df3e64aa50531ad6/deepdiff/model.py#L95 or https://github.com/seperman/deepdiff/blob/c78395c3a0ced0cadea569c3df3e64aa50531ad6/deepdiff/model.py#L11). This would expose a single argument (could only be used in this view) to enable the user to take full advantage of the transformation without affecting functionality of any other views.
I think abstracting any further, maybe so that only 'removals', 'additions', and 'changes' are possible args (each with a subset of keys) does not seem like a good idea to me as it will reduce user choice so much that they could end up transforming the Text view themselves.
If you have more serious interest in a feature like this, then I can write something up! The only thing holding me back is a better picture of an end user API design that you would feel comfortable using.
Let me know your thoughts, I appreciate the dialogue.
I came here for the same reason, but looks like the only nice solution is this one here:
Create ddiff ddiff = DeepDiff(destination_data, source_data, ignore_order=True)
search the ddiff top level for the change_sections of interest (ie. 'values_changed')
iterate through that ddiff dict value (where ddiff key='values_changed') and reconstruct a dict so that new_dict[changed_key] = changed_value
return the new dict
@seperman @emocibob
what @emocibob and I want is
{ 'x': { 'y': [4] }, 'q': { 't': 0.5 } }Is it possible?
there is a way using the pprint builtin python library there is an example of this kind in the documentation:
>>> import pprint
>>> stuff = ['spam', 'eggs', 'lumberjack', 'knights', 'ni']
>>> stuff.insert(0, stuff[:])
>>> pp = pprint.PrettyPrinter(indent=4)
>>> pp.pprint(stuff)
[ ['spam', 'eggs', 'lumberjack', 'knights', 'ni'],
'spam',
'eggs',
'lumberjack',
'knights',
'ni']
>>> pp = pprint.PrettyPrinter(width=41, compact=True)
>>> pp.pprint(stuff)
[['spam', 'eggs', 'lumberjack',
'knights', 'ni'],
'spam', 'eggs', 'lumberjack', 'knights',
'ni']
>>> tup = ('spam', ('eggs', ('lumberjack', ('knights', ('ni', ('dead',
... ('parrot', ('fresh fruit',))))))))
>>> pp = pprint.PrettyPrinter(depth=6)
>>> pp.pprint(tup)
('spam', ('eggs', ('lumberjack', ('knights', ('ni', ('dead', (...)))))))
>>>
https://docs.python.org/3/library/pprint.html
If you would not mind I can open a PR, I only need to know where to do it (what file)
Any update ?
Current Result
{"root[0]['quantity']": {'new_value': 418190.78192466387,
'old_value': 418552.0530982703},
What i want
'entry': [{'coin': 'ETH',
'quantity': 418552.0530982703,
'wallet': '0x0548f59fee79f8832c299e01dca5c76f034f558e'}]
variables are in string format. I would love to reach all parents of diff
Any updates?
@wobeng @sreecodeslayer @emocibob @rgarrigue @wyattshapiro @shibumi @LeonardoLeano333 @fatihsirin @nmaas87
There is a wip force flag on the delta object that might give you something close to what you are asking for with some caveats.
>>> from deepdiff import DeepDiff, Delta
>>> t1 = {
... 'x': {
... 'y': [1, 2, 3]
... },
... 'q': {
... 'r': 'abc',
... }
... }
>>>
>>> t2 = {
... 'x': {
... 'y': [1, 2, 3, 4]
... },
... 'q': {
... 'r': 'abc',
... 't': 0.5,
... }
... }
>>>
>>> diff = DeepDiff(t1, t2)
>>> diff
{'dictionary_item_added': [root['q']['t']], 'iterable_item_added': {"root['x']['y'][3]": 4}}
>>> delta = Delta(diff)
>>> {} + delta
Unable to get the item at root['x']['y'][3]: 'x'
Unable to get the item at root['q']['t']
{}
Once we set the force to be True
>>> delta = Delta(diff, force=True)
>>> {} + delta
{'x': {'y': {3: 4}}, 'q': {'t': 0.5}}
Notice that the force attribute does not know the original object at ['x']['y'] was supposed to be a list, so it assumes it was a dictionary.
You can find this new feature on the dev branch. To learn more about delta: https://zepworks.com/deepdiff/current/delta.html
Due to the lack of this feature we stopped using DeepDiff and instead now using dictdiffer (its docs for quick examples). Its output format is a lot easier to parse and get the relevant changes. I don't personally understand the appeal for root['obj']['obj'] format. At the very least it should be dot.seperated.items to make life a bit easier for developers.
@synchronizing Did you try the Delta(diff, force=True) mentioned above?
The developers can use the tree view which allows them to get the path directly. They don't need to parse root['obj']['obj']. Have you tried that?