diff-match-patch icon indicating copy to clipboard operation
diff-match-patch copied to clipboard

Diff cleanup semantic not working as expected

Open ash-lionell opened this issue 5 years ago • 2 comments

For the following input:

This text is bold, underlined, italicized, Arial and has a different color and size.

This text is not bold, not underlined, not italicized, Calibri and has the same color and size.

The diff cleanup returns:

[Diff(EQUAL,"This text is "), Diff(DELETE,"not "), Diff(EQUAL,"bold,"), Diff(DELETE," not"), Diff(EQUAL," underlined,"), Diff(DELETE," not"), Diff(EQUAL," italicized, "), Diff(DELETE,"Calibri"), Diff(INSERT,"Arial"), Diff(EQUAL," and has "), Diff(DELETE,"the same"), Diff(INSERT,"a different"), Diff(EQUAL," color and size.")]

As expected.

But for some other texts like:

This is a sample text.

This is a sample test.

The diff cleanup returns:

[Diff(EQUAL,"This is a sample te"), Diff(DELETE,"x"), Diff(INSERT,"s"), Diff(EQUAL,"t.")]

Whereas, it should've cleaned up the last word text/test and shown it as one DELETE/INSERT operation.

Observed this behavior in both Java and Javascript bindings.

ash-lionell avatar Oct 26 '20 08:10 ash-lionell

I can confirm this issue. Here's my example in javascript:

var text1 = 'I ate a red apple.';
var text2 = 'I ate a green apple.';

var dmp = new diff_match_patch();
var diffs = dmp.diff_main(text1, text2);
dmp.diff_cleanupSemantic(diffs);

Result:

EQUAL	"I ate a "
INSERT	"g"
EQUAL	"re"
DELETE	"d"
INSERT	"en"
EQUAL	" apple."

Expected result:

EQUAL	"I ate a "
INSERT	"green"
DELETE	"red"
EQUAL	" apple."

mark1bean avatar Mar 05 '23 21:03 mark1bean

@NeilFraser Please support this issue. 🙇🏻

ttpho avatar Mar 24 '23 18:03 ttpho