imara-diff icon indicating copy to clipboard operation
imara-diff copied to clipboard

Add convenience feature for word diff

Open pascalkuthe opened this issue 1 year ago • 1 comments

Performing a word diff over a full file can be fairly slow on large files. A better approach is to perform a line diff first and and then perform the word diff on the found changes. While this is already possible with imara-diff is requires quite a bit of legwork and can be tricky to get right. It would be nice if this could be included in the library directly. This has multiple steps for an implementation:

  • Determine the output format. A different trait or force collecting into a Vec?
  • Implement a TokenSource for words
  • Implement a Sink that automatically computes a word diff
  • Potentially implement a heuristic to detect and ignore

The diff algorithm in git only operates on lines. It is worth looking into what exactly they use to produce a colored word diff from the line diff. Perhaps a different algorithm is a better fit?

pascalkuthe avatar Oct 26 '22 16:10 pascalkuthe

FYI git does word diffing by feeding the same algorithm with one word per line.

jlama avatar Sep 28 '23 16:09 jlama