diff-match-patch
diff-match-patch copied to clipboard
Diff cleanup semantic not working as expected
For the following input:
This text is bold, underlined, italicized, Arial and has a different color and size.
This text is not bold, not underlined, not italicized, Calibri and has the same color and size.
The diff cleanup returns:
[Diff(EQUAL,"This text is "), Diff(DELETE,"not "), Diff(EQUAL,"bold,"), Diff(DELETE," not"), Diff(EQUAL," underlined,"), Diff(DELETE," not"), Diff(EQUAL," italicized, "), Diff(DELETE,"Calibri"), Diff(INSERT,"Arial"), Diff(EQUAL," and has "), Diff(DELETE,"the same"), Diff(INSERT,"a different"), Diff(EQUAL," color and size.")]
As expected.
But for some other texts like:
This is a sample text.
This is a sample test.
The diff cleanup returns:
[Diff(EQUAL,"This is a sample te"), Diff(DELETE,"x"), Diff(INSERT,"s"), Diff(EQUAL,"t.")]
Whereas, it should've cleaned up the last word text/test and shown it as one DELETE/INSERT operation.
Observed this behavior in both Java and Javascript bindings.
I can confirm this issue. Here's my example in javascript:
var text1 = 'I ate a red apple.';
var text2 = 'I ate a green apple.';
var dmp = new diff_match_patch();
var diffs = dmp.diff_main(text1, text2);
dmp.diff_cleanupSemantic(diffs);
Result:
EQUAL "I ate a "
INSERT "g"
EQUAL "re"
DELETE "d"
INSERT "en"
EQUAL " apple."
Expected result:
EQUAL "I ate a "
INSERT "green"
DELETE "red"
EQUAL " apple."
@NeilFraser Please support this issue. 🙇🏻