deepdiff icon indicating copy to clipboard operation
deepdiff copied to clipboard

Feature: only_include_paths

Open shane-davidson opened this issue 4 years ago • 15 comments

When comparing extremely large objects sometimes I wish to specify only the paths I wish to diff rather than exclude a very large number of paths. (sometime regex is not helpful if all the keys are different either.)

Suggestion for two new params:

only_include_paths, List of paths to include when performing a diff. If only one item, you can path it as a string.
only_include_regex_paths, List of string regex paths or compiled regex paths objects to include when performing a diff. If only one item, you can pass it as a string or regex compiled object.

include paths + regex paths are combined to generate a list of paths used for checking.

I'm not sure if the fields should be removed before the diff or after and just removed from the report. for performance before the diff would obviously be better.

If there is some way to already do this I have missed please let me know :)

shane-davidson avatar Oct 08 '21 03:10 shane-davidson

Hi @shane-davidson

I just released DeepDiff 5.6.0 There is a new parameter called custom operators

You can effectively use it to limit checks to certain paths by subclassing BaseOperator and defining your own match and give_up_diffing functions.

You can see some more examples in the tests: https://github.com/seperman/deepdiff/blob/master/tests/test_operators.py

seperman avatar Oct 13 '21 06:10 seperman

Amazing! I will have a look. thanks :)

shane-davidson avatar Oct 13 '21 08:10 shane-davidson

I just tested this and I am getting the following error:

TypeError: match() missing 1 required positional argument: 'level'

Using the following code:

t1 = {'test': 'yes', 'test2': 'yes', 'test3': {'test4': 'yes', 'test5': {'test6': 'yes'}}}
t2 = {'test': 'no', 'test2': 'no', 'test3': {'test4': 'no', 'test5': {'test6': 'no'}}}

class OnlyIncludePathsOperator(BaseOperator):
    def give_up_diffing(self, level, diff_instance):
        return False


ddiff = DeepDiff(t1, t2, custom_operators=[OnlyIncludePathsOperator])

EDIT: realised I passed the class and not an instance of the class OnlyIncludePathsOperator(). It is working now.

shane-davidson avatar Oct 13 '21 09:10 shane-davidson

For anyone coming here for a solution, this operator will only diff a list of matching paths

import re
from typing import List
from deepdiff import DeepDiff
from deepdiff.operator import BaseOperator
from deepdiff.helper import convert_item_or_items_into_compiled_regexes_else_none


class OnlyIncludePathsOperator(BaseOperator):
    def __init__(self, paths: List[str] = []):
        super().__init__(regex_paths=["root\\['.*'\\]"]) # match everything
        self.only_paths = convert_item_or_items_into_compiled_regexes_else_none(paths)
    def give_up_diffing(self, level, diff_instance):
        if self.only_paths == None:
            # if no path is set then match everything.
            return False
        if self.only_paths:
            for pattern in self.only_paths:
                matched = re.search(pattern, level.path()) is not None
                if matched:
                    return False
        return True

ddiff = DeepDiff(t1, t2, custom_operators=[OnlyIncludePathsOperator(paths=[])])

shane-davidson avatar Oct 13 '21 11:10 shane-davidson

You don't have to use regex. You can completely modify it. In fact there is no need to subclass the base operator then:

class MyOperator:

    def __init__(self, include_paths):
        self.include_paths = include_paths

    def match(self, level) -> bool:
        return True

    def give_up_diffing(self, level, diff_instance) -> bool:
        return level.path() not in self.include_paths

seperman avatar Oct 13 '21 18:10 seperman

I added this example in the docs too: https://zepworks.com/deepdiff/current/custom.html#custom-operators

seperman avatar Oct 13 '21 18:10 seperman

Hi @seperman how's it going.

I tested the custom operators out with the following example:

t1 = {'test': 'yes', 'test2': 'no'}
t2 = {'test': 'yes', 'test2a': 'no'}

DeepDiff(t1, t2, custom_operators=[MyOperator(include_paths="root['test']")])

with your MyOperator class example.

I am finding the result is: {'dictionary_item_added': [root['test2a']], 'dictionary_item_removed': [root['test2']]}

but I am actually expecting that the diff only cares about the keys 'test' and return no difference with the example. It does return the expected no difference result if the second keys are the same.

Should the expected behaviour be only diffing the paths that are passed to include_paths?

Also noticed if the keys are different, they do not get passed into give_up_diffing

jjw24 avatar Oct 14 '21 08:10 jjw24

@jjw24 Yes it is exactly because of what you are describing here. Since we compare values of keys at the same path, the decision to report the dictionary item added or removed is done before it gets to this operator. Hence the include_paths is only effective for items that are modified not added or removed. I will look into adding the operator to be applied at the item add/remove logic too.

seperman avatar Oct 16 '21 16:10 seperman

Thank you @seperman, much appreciated. Great library btw!

jjw24 avatar Oct 17 '21 23:10 jjw24

Thanks! Please consider starring it on github if you want!

seperman avatar Oct 18 '21 22:10 seperman

Already have :)

Do you know roughly when you will get some time to work on this?

jjw24 avatar Oct 18 '21 22:10 jjw24

Probably some time in November.

On Oct 18, 2021, at 3:52 PM, Jeremy Wu @.***> wrote:

 Already have :)

Do you know roughly when you will get some time to work on this?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

seperman avatar Oct 19 '21 15:10 seperman

Hi @seperman, one year later, what's happens with this feature ? Tks

lpxavi avatar Sep 29 '22 15:09 lpxavi

Hi , This feature was added a while ago:

https://zepworks.com/deepdiff/current/exclude_paths.html#include-paths

I will close this ticket.

Sep Dehpour

On Sep 29, 2022, at 8:16 AM, lpxavi @.***> wrote:

 Hi @seperman, one year later, what's happens with this feature ? Tks

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.

seperman avatar Oct 02 '22 14:10 seperman

Fantastic, thank you.

jjw24 avatar Oct 02 '22 19:10 jjw24