diff-match-patch icon indicating copy to clipboard operation
diff-match-patch copied to clipboard

Empty diff text?

Open ndvbd opened this issue 3 years ago • 12 comments

I received (very rare) a diff (INSERT) with empty string (length of 0). Can it be?

ndvbd avatar Nov 10 '20 10:11 ndvbd

You are not crazy. I have seen the same - and also for DEL with length of 0. I just filter out.

iansparks avatar Jan 27 '21 13:01 iansparks

@iansparks, is there an example that you can share of text (before/after) that show this behavior?

ndvbd avatar Jan 30 '21 07:01 ndvbd

This is not at all pretty. I have tried to cut down this example but it acts in strange ways. Removing some characters will cause the problem not to happen.

Uses python version of d-m-p:

import diff_match_patch

diff_obj = diff_match_patch.diff_match_patch()

content1 = """
    /*  {
        new object[]
       {
            new string[]
            {
           }
            new string[]
            {
            }
        }
    }*/
}
;
System.Collections.Generic.Dictionary<string, string> dicDefaultValuesForDSL =
{

    {
        "Other", "Other - specify"
    }


}
;
"""

content2 = """
    /* {
    new object[]
    {
    new string[]
    {
    }
    new string[]
    {
    }
    }
    }*/
}
;
System.Collections.Generic.Dictionary<string, string> dicDefaultValuesForDSL =
{


}
;
"""



diffs = diff_obj.diff_main(content1, content2)
diff_obj.diff_cleanupSemantic(diffs)

for diff in diffs:
    print(diff)

for me the result is:

(0, '\n    /*')
(-1, ' ')
(0, ' {\n')
(-1, '    ')
(0, '    new object[]\n    ')
(-1, '   ')
(0, '{\n')
(-1, '        ')
(0, '    new string[]\n    ')
(-1, '        {\n           }\n            new string[]\n            {\n            }\n    ')
(1, '{\n    }\n    new string[]\n    {\n    }\n')
(0, '    }\n    }*/\n}\n;\nSystem.Collections.Generic.Dictionary<string, string> dicDefaultValuesForDSL =\n')
(1, '')
(0, '{\n\n')
(-1, '    {\n        "Other", "Other - specify"\n    }\n\n')
(0, '\n}\n;\n')

See empty string 4 lines from the end. An addition of a '' ?

iansparks avatar Jan 30 '21 22:01 iansparks

I should also say. python diff-match-patch==20200713

iansparks avatar Jan 30 '21 22:01 iansparks

@iansparks I can confirm that the same behaviour happens in Javascript:


var dmp = new diff_match_patch();
    content1 = "\n    /*  {\n        new object[]\n       {\n            new string[]\n            {\n           }\n            new string[]\n            {\n            }\n        }\n    }*/\n}\n;\nSystem.Collections.Generic.Dictionary<string, string> dicDefaultValuesForDSL =\n{\n\n    {\n        \"Other\", \"Other - specify\"\n    }\n\n\n}\n;"
    content2 = "\n    /* {\n    new object[]\n    {\n    new string[]\n    {\n    }\n    new string[]\n    {\n    }\n    }\n    }*/\n}\n;\nSystem.Collections.Generic.Dictionary<string, string> dicDefaultValuesForDSL =\n{\n\n\n}\n;";
    diff = dmp.diff_main(content1, content2);
    dmp.diff_cleanupSemantic(diff);
    console.log(diff);

And the result is:


0: (2) [0, "↵    /*"]
1: (2) [-1, " "]
2: (2) [0, " {↵"]
3: (2) [-1, "    "]
4: (2) [0, "    new object[]↵    "]
5: (2) [-1, "   "]
6: (2) [0, "{↵"]
7: (2) [-1, "        "]
8: (2) [0, "    new string[]↵    "]
9: (2) [-1, "        {↵           }↵            new string[]↵            {↵            }↵    "]
10: (2) [1, "{↵    }↵    new string[]↵    {↵    }↵"]
11: (2) [0, "    }↵    }*/↵}↵;↵System.Collections.Generic.Dictionary<string, string> dicDefaultValuesForDSL =↵"]
12: (2) [1, ""]
13: (2) [0, "{↵↵"]
14: (2) [-1, "    {↵        "Other", "Other - specify"↵    }↵↵"]
15: (2) [0, "↵}↵;"]

By the way, the problem happens even we disable the diff_cleanupSemantic

This is clearly a bug.

ndvbd avatar Feb 01 '21 15:02 ndvbd

the behavior isn't right obviously but it shouldn't cause any problems for a consumer of the diff because they do crop up from time to time.

I know that the cleanup passes can do funny things by shuffling around edits. I'm noticing that the empty insertion comes at newlines.

@iansparks were you able to get it to produce with any other subset of this text? did you try just the few lines above and below the empty insertion?

dmsnell avatar Feb 01 '21 21:02 dmsnell

Hi @dmsnell, I tried to minimize this example but its hard to get it smaller. For example, the pre-diffed text both contain the string:

System.Collections.Generic.Dictionary<string, string> dicDefaultValuesForDSL

You would think that editing the middle of that, to make them both:

System dicDefaultValuesForDSL

Would not affect this empty diff insert result but it does. With that edit I don't get the empty (1,"") change:

(0, '\n    /*')
(-1, ' ')
(0, ' {\n')
(-1, '    ')
(0, '    new object[]\n    ')
(-1, '   ')
(0, '{\n')
(-1, '        ')
(0, '    new string[]\n    ')
(-1, '        {\n           }\n        ')
(1, '{\n    }\n')
(0, '    new string[]\n')
(-1, '        ')
(0, '    {\n    ')
(-1, '        }\n    ')
(1, '}\n')
(0, '    }\n    }*/\n}\n;\nSystem dicDefaultValuesForDSL =\n{\n\n')
(-1, '    {\n        "Other", "Other - specify"\n    }\n\n')
(0, '\n}\n;\n')

Suddenly, no extra empty diff change.

iansparks avatar Feb 01 '21 22:02 iansparks

I've been able to shorten it somewhat

text 1 : \n {\n }\n {\n }\n} ;\nSystem.Collections.Generic.Dictionary<string, string> dicDefaultValuesForDSL \n{\n {\n \"Other\", \"Other - specify\"\n } }

text 2 : \n{\n }\n {\n }\n} ;\nSystem.Collections.Generic.Dictionary<string, string> dicDefaultValuesForDSL \n{}

based on the way it's finicky on newlines, length of lines, and characters I suspect something is going on with the point values in cleanupSemanticLossless

notably I observed a shifting of the { after the System line from being contained in that equality with the one after it, the kind of operation performed by the cleanup passes.

it's not the timeout because I'm reproducing when it only takes 1.5µs

dmsnell avatar Feb 01 '21 23:02 dmsnell

@dmsnell see what i wrote above: "the problem happens even we don't run the diff_cleanupSemantic"

ndvbd avatar Feb 02 '21 09:02 ndvbd

my bad, @ndvbd - I was thinking that diff_main was already calling diff_cleanupSemantic, which I should have known better since I've been all around that code.

this does limit the problem but I haven't seen where exactly yet it could be occurring. I'll try and do some more investigation by commenting out parts of diff_cleanupMerge

dmsnell avatar Feb 02 '21 23:02 dmsnell

Anyone found anything?

ndvbd avatar Jun 18 '21 07:06 ndvbd

If I find anything I will post an update here.

dmsnell avatar Jun 18 '21 14:06 dmsnell