diff-match-patch
diff-match-patch copied to clipboard
Empty diff text?
I received (very rare) a diff (INSERT) with empty string (length of 0). Can it be?
You are not crazy. I have seen the same - and also for DEL with length of 0. I just filter out.
@iansparks, is there an example that you can share of text (before/after) that show this behavior?
This is not at all pretty. I have tried to cut down this example but it acts in strange ways. Removing some characters will cause the problem not to happen.
Uses python version of d-m-p:
import diff_match_patch
diff_obj = diff_match_patch.diff_match_patch()
content1 = """
/* {
new object[]
{
new string[]
{
}
new string[]
{
}
}
}*/
}
;
System.Collections.Generic.Dictionary<string, string> dicDefaultValuesForDSL =
{
{
"Other", "Other - specify"
}
}
;
"""
content2 = """
/* {
new object[]
{
new string[]
{
}
new string[]
{
}
}
}*/
}
;
System.Collections.Generic.Dictionary<string, string> dicDefaultValuesForDSL =
{
}
;
"""
diffs = diff_obj.diff_main(content1, content2)
diff_obj.diff_cleanupSemantic(diffs)
for diff in diffs:
print(diff)
for me the result is:
(0, '\n /*')
(-1, ' ')
(0, ' {\n')
(-1, ' ')
(0, ' new object[]\n ')
(-1, ' ')
(0, '{\n')
(-1, ' ')
(0, ' new string[]\n ')
(-1, ' {\n }\n new string[]\n {\n }\n ')
(1, '{\n }\n new string[]\n {\n }\n')
(0, ' }\n }*/\n}\n;\nSystem.Collections.Generic.Dictionary<string, string> dicDefaultValuesForDSL =\n')
(1, '')
(0, '{\n\n')
(-1, ' {\n "Other", "Other - specify"\n }\n\n')
(0, '\n}\n;\n')
See empty string 4 lines from the end. An addition of a '' ?
I should also say. python diff-match-patch==20200713
@iansparks I can confirm that the same behaviour happens in Javascript:
var dmp = new diff_match_patch();
content1 = "\n /* {\n new object[]\n {\n new string[]\n {\n }\n new string[]\n {\n }\n }\n }*/\n}\n;\nSystem.Collections.Generic.Dictionary<string, string> dicDefaultValuesForDSL =\n{\n\n {\n \"Other\", \"Other - specify\"\n }\n\n\n}\n;"
content2 = "\n /* {\n new object[]\n {\n new string[]\n {\n }\n new string[]\n {\n }\n }\n }*/\n}\n;\nSystem.Collections.Generic.Dictionary<string, string> dicDefaultValuesForDSL =\n{\n\n\n}\n;";
diff = dmp.diff_main(content1, content2);
dmp.diff_cleanupSemantic(diff);
console.log(diff);
And the result is:
0: (2) [0, "↵ /*"]
1: (2) [-1, " "]
2: (2) [0, " {↵"]
3: (2) [-1, " "]
4: (2) [0, " new object[]↵ "]
5: (2) [-1, " "]
6: (2) [0, "{↵"]
7: (2) [-1, " "]
8: (2) [0, " new string[]↵ "]
9: (2) [-1, " {↵ }↵ new string[]↵ {↵ }↵ "]
10: (2) [1, "{↵ }↵ new string[]↵ {↵ }↵"]
11: (2) [0, " }↵ }*/↵}↵;↵System.Collections.Generic.Dictionary<string, string> dicDefaultValuesForDSL =↵"]
12: (2) [1, ""]
13: (2) [0, "{↵↵"]
14: (2) [-1, " {↵ "Other", "Other - specify"↵ }↵↵"]
15: (2) [0, "↵}↵;"]
By the way, the problem happens even we disable the diff_cleanupSemantic
This is clearly a bug.
the behavior isn't right obviously but it shouldn't cause any problems for a consumer of the diff because they do crop up from time to time.
I know that the cleanup passes can do funny things by shuffling around edits. I'm noticing that the empty insertion comes at newlines.
@iansparks were you able to get it to produce with any other subset of this text? did you try just the few lines above and below the empty insertion?
Hi @dmsnell, I tried to minimize this example but its hard to get it smaller. For example, the pre-diffed text both contain the string:
System.Collections.Generic.Dictionary<string, string> dicDefaultValuesForDSL
You would think that editing the middle of that, to make them both:
System dicDefaultValuesForDSL
Would not affect this empty diff insert result but it does. With that edit I don't get the empty (1,"") change:
(0, '\n /*')
(-1, ' ')
(0, ' {\n')
(-1, ' ')
(0, ' new object[]\n ')
(-1, ' ')
(0, '{\n')
(-1, ' ')
(0, ' new string[]\n ')
(-1, ' {\n }\n ')
(1, '{\n }\n')
(0, ' new string[]\n')
(-1, ' ')
(0, ' {\n ')
(-1, ' }\n ')
(1, '}\n')
(0, ' }\n }*/\n}\n;\nSystem dicDefaultValuesForDSL =\n{\n\n')
(-1, ' {\n "Other", "Other - specify"\n }\n\n')
(0, '\n}\n;\n')
Suddenly, no extra empty diff change.
I've been able to shorten it somewhat
text 1 : \n {\n }\n {\n }\n} ;\nSystem.Collections.Generic.Dictionary<string, string> dicDefaultValuesForDSL \n{\n {\n \"Other\", \"Other - specify\"\n } }
text 2 : \n{\n }\n {\n }\n} ;\nSystem.Collections.Generic.Dictionary<string, string> dicDefaultValuesForDSL \n{}
based on the way it's finicky on newlines, length of lines, and characters I suspect something is going on with the point values in cleanupSemanticLossless
notably I observed a shifting of the {
after the System
line from being contained in that equality with the one after it, the kind of operation performed by the cleanup passes.
it's not the timeout because I'm reproducing when it only takes 1.5µs
@dmsnell see what i wrote above: "the problem happens even we don't run the diff_cleanupSemantic"
my bad, @ndvbd - I was thinking that diff_main
was already calling diff_cleanupSemantic
, which I should have known better since I've been all around that code.
this does limit the problem but I haven't seen where exactly yet it could be occurring. I'll try and do some more investigation by commenting out parts of diff_cleanupMerge
Anyone found anything?
If I find anything I will post an update here.