go-diff icon indicating copy to clipboard operation
go-diff copied to clipboard

Munged text leads to incorrect diffs

Open schroederc opened this issue 1 year ago • 4 comments

https://github.com/sergi/go-diff/commit/db1b095f5e7c905e196ff6bfd56189a41aa76309 introduces a bug in its change from diffLinesToRunesMunge to diffLinesToStringsMunge. Since each line is represented by 1 or more ascii characters, it's possible for the diffing algorithm to split hashed lines incorrectly whereas before the rune indexed lines were indivisible.

For instance, DiffLinesToChars could return hashed strings such as:

"1,2,3,4,5,6,7,8,9,10,9,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,9,27,28,9,29,30,31,32,33,9,34,35,36,37,38,39,40,41"
"42,9,10,9,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,9,27,28,9,29,30,31,32,33,9,34,43,36,37,38,39,44,45,46,47"

DiffMain may then split the leading 42 such as:

[{Delete 1,2,3,} {Equal 4} {Delete ,5,6,7,8} {Insert 2} {Equal ,9,10,9,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,9,27,28,9,29,30,31,32,33,9,34,} {Insert 4} {Equal 3} {Delete 5} {Equal ,36,37,38,39,4} {Delete 0} {Insert 4} {Equal ,4} {Delete 1} {Insert 5,46,47}] 

And the resulting diff after hydration is completely wrong.

This affects users of the DiffLinesTo* APIs as well as any user that passes true for checklines in DiffMain or DiffMainRunes.

schroederc avatar Mar 29 '23 21:03 schroederc