deepdiff icon indicating copy to clipboard operation
deepdiff copied to clipboard

Feature: exclude specific objects

Open maxrothman opened this issue 8 years ago • 5 comments

It's currently possible to exclude types from DeepDiff and DeepSearch using the exclude_types kwarg. However, it's not possible to exclude certain values. Here's an example of what I'm looking for:

>>> d1 = {1: None, 2: 'a'}
>>> d2 = {1: 'foo', 2: 'b'}
>>> DeepDiff(d1, d2, exclude=[None])
{'values_changed': {'root[2]': {'new_value': 'b', 'old_value': 'a'}}}

This would be useful for excluding complex objects from a diff or search, such as UUIDs or datetimes. Though not strictly necessary, it might be a good idea to also make it possible to use is to compare searched and excluded objects rather than ==. I'm less confident about what that API should look like, but maybe something like a id_comparison=False kwarg?

I'm happy to look at making a PR for this if you're on board with this change.

maxrothman avatar Jun 07 '17 14:06 maxrothman

I think we should start deprecating exclude_type and instead make it so it takes exclude. Then it checks if the item in the exclude list is a class or not. If it is a class it should basically do exclude_type, otherwise it should use is as you mentioned to exclude exact objects that need to be excluded. Please make a PR if you have time. Also please make the appropriate changes to the contenthash.py so the exclude works there too since ignore_order uses the content hashes.

seperman avatar Jun 13 '17 21:06 seperman

I'm not sure I agree with the approach of magically deciding whether to do isinstance vs == vs is comparison. Classes are also objects, what you want to exclude a class object from a diff? Or what about objects with well-defined == operations, like datetimes and uuids? It almost seems like we'd need a more complex API, something like this:

DD = DeepDiff()
DD.exclude(object=None, id_comparison=True)
DD.exclude(object=date(2017, 6, 13))
DD.exclude(type=UUID)
DD.diff(first, second)

I'm not terribly fond of this specific API, but my point is more that it seems like we need kwargs to pull it off.

maxrothman avatar Jun 14 '17 03:06 maxrothman

Thinking about this a little more, being able to match based on ID or == would also be useful in DeepSearch for matching the thing you're searching for. Additionally, I can imagine users wanting to add more complex matching rules, so maybe this is a good opportunity for a hole-in-the-middle approach, something like this:

from deepdiff import DeepSearch, Match
from deepdiff.matchers import by_id

DeepSearch(thing1, thing2, matcher=by_id, exclude=[123, 'foo', (lambda x: return x%2 == 0)])

This way, users could specify their own matchers, and existing flags like case_sensitive could become matcher functions (though it might be good to leave in the kwarg as a shortcut).

maxrothman avatar Jun 19 '17 19:06 maxrothman

Hi @maxrothman Sorry for such a long delay in responding to you. I finally found some time to release the v4.0.0 and take a look at the tickets. Are you still interested in making the PR? That would be great! I like your ideas. Based on what you came up with, the API that I'm imagining could be:

from deepdiff import exclude_condition  # for advanced exclusions

ex_cond1 = exclude_condition(type_=DateTime, condition=lambda x: x > 2018)

DeepDiff(t1, t2, exclude_objects=(123, "blah",...), exclude_conditions=[ex_cond1]

So the exclude_condition takes type_ and condition. And then you can pass those conditions into the exclude_conditions parameter.

exclude_objects will be a list of objects that will be ignore based on ==. I think using is will be tricky. Where do you see is working better than == ?

seperman avatar Mar 24 '19 08:03 seperman

Looking at this design, I think it could be simplified to the following:

DeepDiff(t1, t2, exclude: List[Callable[[itm], bool]])
# e.g.
DeepDiff(t1, t2, exclude=[lambda x: x==1, lambda x: isinstance(x, DateTime), lambda x: x is None])

This API supports the use cases of both exclude_objects and exclude_conditions, makes the ==/is choice a non-issue, and it supports any other use cases anyone could come up with.

Now that you're calling N functions for each object in the diff, performance could become a concern. In that case, you could build special-case optimizations for common patterns:

from deepdiff import exclude_equals, exclude_type
DeepDiff(t1, t2, exclude=[exclude_equals(123), exclude_type(DateTime)]

Unlike functions, the sentinel objects returned by these functions would only result in a single function call, and special logic could be added to the diffing algorithm that expects these sentinel objects and evaluates their conditions in fewer function calls.

I'm no longer actively using deepdiff, so it's unlikely I'll have the time to contribute this change. I'd be happy to continue helping with the design though.

maxrothman avatar Mar 27 '19 17:03 maxrothman

I'm going to close this ticket since we have had the Exclude Obj Callback for a while.

seperman avatar Nov 19 '23 15:11 seperman