Improve diff readability for eg "Bob," -> "Bob"
If a hunk consists of only deleting or only adding at the beginning or end of a word, then combine them into one hunk.
Examples:
-
Changes from
simplediff:- Alice, Bob and Charlie + Alice, Bob, and Charliesimplediff: Alice,
BobBob, Charlie this diff: Alice, Bob, Charlie- Alice Bob Charlie's Angels And David + Alice Bob Charlie Davidsimplediff: Alice Bob
Charlie's Angels AndCharlie David this diff: Alice Bob Charlie's Angels AndDavid -
Same as
simplediff:-
hunks you wouldn't want simplified:
- Alice Bob Charlie + Alice Robert Charliediff: Alice
BobRobert Charlie -
if the change isn't only at the beginning or end:
- Alice Bob Charlie + Alice Blob Charliediff: Alice
BobBlob Charlie- Alice Bobby Charlie + Alice bb Charliediff: Alice
Bobbybb Charlie- Alice Zeneca Charlie + Alice AstraZeneca Charlie'sdiff: Alice
Zeneca CharlieAstraZeneca Charlie's
-
Thanks for the PR @laughinghan, let me find some time to test it locally and I will merge it if it looks good.
@laughinghan the problem I have with your proposed patch is with cases like the following:
old = "Provide them Clues"
new = "Provide the Clues"
assert html_diff_original(old, new) == 'Provide <del>them</del> <ins>the</ins> Clues'
assert html_diff_proposed(old, new) == 'Provide <del>them</del> <ins>the</ins> Clues'
The assert with the current html_diff is what I would expect, the word was changed from "them" to "the" and I want the diff to show the whole word, what I get when I run that test with the proposed change is:
E AssertionError: assert 'Provide the<...m</del> Clues' == 'Provide <del...e</ins> Clues'
E - Provide <del>them</del> <ins>the</ins> Clues
E + Provide the<del>m</del> Clues
I would be OK and like that the algorithm didn't consider punctuation marks as part of a word, but I don't want words to be cut into parts when they are changed.