NYTdiff icon indicating copy to clipboard operation
NYTdiff copied to clipboard

Improve diff readability for eg "Bob," -> "Bob"

Open laughinghan opened this issue 4 years ago • 2 comments

If a hunk consists of only deleting or only adding at the beginning or end of a word, then combine them into one hunk.

Examples:

  • Changes from simplediff:

    - Alice, Bob and Charlie
    + Alice, Bob, and Charlie
    

    simplediff: Alice, BobBob, Charlie this diff: Alice, Bob, Charlie

    - Alice Bob Charlie's Angels And David
    + Alice Bob Charlie David
    

    simplediff: Alice Bob Charlie's Angels AndCharlie David this diff: Alice Bob Charlie's Angels And David

  • Same as simplediff:

    • hunks you wouldn't want simplified:

      - Alice Bob Charlie
      + Alice Robert Charlie
      

      diff: Alice Bob Robert Charlie

    • if the change isn't only at the beginning or end:

      - Alice Bob Charlie
      + Alice Blob Charlie
      

      diff: Alice Bob Blob Charlie

      - Alice Bobby Charlie
      + Alice bb Charlie
      

      diff: Alice Bobby bb Charlie

      - Alice Zeneca Charlie
      + Alice AstraZeneca Charlie's
      

      diff: Alice Zeneca Charlie AstraZeneca Charlie's

laughinghan avatar May 04 '21 08:05 laughinghan

Thanks for the PR @laughinghan, let me find some time to test it locally and I will merge it if it looks good.

j-e-d avatar May 05 '21 02:05 j-e-d

@laughinghan the problem I have with your proposed patch is with cases like the following:

old = "Provide them Clues"
new = "Provide the Clues"
    
assert html_diff_original(old, new) == 'Provide <del>them</del> <ins>the</ins> Clues'
assert html_diff_proposed(old, new) == 'Provide <del>them</del> <ins>the</ins> Clues'

The assert with the current html_diff is what I would expect, the word was changed from "them" to "the" and I want the diff to show the whole word, what I get when I run that test with the proposed change is:

E       AssertionError: assert 'Provide the<...m</del> Clues' == 'Provide <del...e</ins> Clues'
E         - Provide <del>them</del> <ins>the</ins> Clues
E         + Provide the<del>m</del> Clues

I would be OK and like that the algorithm didn't consider punctuation marks as part of a word, but I don't want words to be cut into parts when they are changed.

j-e-d avatar May 05 '21 18:05 j-e-d