diff-match-patch icon indicating copy to clipboard operation
diff-match-patch copied to clipboard

Diff'ing bytes

Open jhogan opened this issue 7 years ago • 2 comments

Is it possible to diff bytes in the Python3 library. I'm not able to.

jhogan@bastion:/usr/lib/python3/dist-packages/diff_match_patch$ python3
Python 3.5.2 (default, Nov 23 2017, 16:37:01)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from diff_match_patch import diff_match_patch
>>> from uuid import uuid4
>>> dmp = diff_match_patch()
>>> dmp.diff_main(uuid4().bytes, uuid4().bytes)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3/dist-packages/diff_match_patch/diff_match_patch.py", line 136, in diff_main
    self.diff_cleanupMerge(diffs)
  File "/usr/lib/python3/dist-packages/diff_match_patch/diff_match_patch.py", line 936, in diff_cleanupMerge
    text_delete += diffs[pointer][1]
TypeError: Can't convert 'bytes' object to str implicitly

Note, I'm using the Ubuntu version of the software (the python3-diff-match-patch package). Also note that that the stack trace can be different but it always causes an exception on the same line. For example:

>>> dmp.diff_main(uuid4().bytes, uuid4().bytes)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3/dist-packages/diff_match_patch/diff_match_patch.py", line 129, in diff_main
    diffs = self.diff_compute(text1, text2, checklines, deadline)
  File "/usr/lib/python3/dist-packages/diff_match_patch/diff_match_patch.py", line 196, in diff_compute
    return self.diff_bisect(text1, text2, deadline)
  File "/usr/lib/python3/dist-packages/diff_match_patch/diff_match_patch.py", line 351, in diff_bisect
    return self.diff_bisectSplit(text1, text2, x1, y1, deadline)
  File "/usr/lib/python3/dist-packages/diff_match_patch/diff_match_patch.py", line 377, in diff_bisectSplit
    diffs = self.diff_main(text1a, text2a, False, deadline)
  File "/usr/lib/python3/dist-packages/diff_match_patch/diff_match_patch.py", line 136, in diff_main
    self.diff_cleanupMerge(diffs)
  File "/usr/lib/python3/dist-packages/diff_match_patch/diff_match_patch.py", line 936, in diff_cleanupMerge
    text_delete += diffs[pointer][1]

I assume this is because each call is using random input (UUID's)

Is the solution to convert the binary data to strings first:

dmp.diff_main(str(uuid4().bytes), str(uuid4().bytes))

jhogan avatar Apr 21 '18 17:04 jhogan

I think this project is for text only

GrosSacASac avatar Jan 24 '19 16:01 GrosSacASac

For those who can't convert raw bytes to strings due to encoding issues, a workaround is to convert the bytes to hex sequences and compute the diff with those, since they are encoded in ascii.

Note that a byte will correspond to two characters, and the diff may generate blocks that have an odd number of characters. In those cases, I found that taking the last character (equivalent to a nibble) of an unchanged block and adding it to the beginning of the next changed blocks works fine.

See the following example, which deals with odd blocks and outputs differences in unified format.

nevesnunes avatar Aug 05 '20 22:08 nevesnunes