diff-match-patch icon indicating copy to clipboard operation
diff-match-patch copied to clipboard

Improve readability of diff_cleanupSemantic

Open TravisJRyan opened this issue 1 year ago • 0 comments

Issue

The readability of diff_cleanupSemantic could be improved. While it's promoted for use when a human needs to read the diff per the README, I'm still seeing that words are partially split. I think the algorithm should always try and make diffs of words/phrases complete and not interrupted in between words.

The use case is building a UI where semantic diffs are key to track before/after changes to some text.

Example

Before: The dog was a little hungry. After: The duck was a bit hungry.

Expected: "dog" is completely crossed out in place of "duck", and "little" is completed crossed out in place of "bit".

Actual output: diff-example

The above takes a lot of mental energy to understand, so it's not feasible for humans to use this to understand the diff for a large piece of text, as these types of issues occur frequently. Some way to more aggressively try and separate words/phrases into readable diffs would be preferred here.

I've also tried using diff_efficientCleanup with varying edit distances, but it doesn't seem to fully get rid of this problem, as the algorithm needs more knowledge about how to split on words/phrases for readability.

TravisJRyan avatar Jul 24 '24 16:07 TravisJRyan