tty-file icon indicating copy to clipboard operation
tty-file copied to clipboard

multibyte character will be broken when it is divided by block size during comparing

Open kmuto opened this issue 4 years ago • 0 comments

Describe the problem

TTY::File::CompareFiles#call seems read a file by chunk of block size. When there is a multibyte character (CJK character, emoji, etc) crosses between blocks, the character will be broken.

Steps to reproduce the problem

./diff-j.rb
       diff  4096-a.txt and 4096-aj.txt
--- 4096-a.txt
+++ 4096-aj.txt
@@ -1 +1 @@
-aaa(repeats 4096 times )aaa�
@@ -1 +1 @@
-A
+��い

4096-a.txt

aaa(repeats 4096 times)aaaA

4096-aj.txt

aaa(repeats 4096 times)aaaあい

check

puts TTY::File.diff("4096-a.txt", "4096-aj.txt")

Actual behaviour

Multi byte character is divided by byte, and broken.

�
��い

Expected behaviour

./diff-j.rb
       diff  4096-a.txt and 4096-aj.txt
--- 4096-a.txt
+++ 4096-aj.txt
@@ -1 +1 @@
-aaa(repeats 4096 times )aaa
@@ -1 +1 @@
-A
+あい

It looks hard to solve with current implementation using block reads.

Describe your environment

  • OS version: Debian 11
  • Ruby version: 2.7.4
  • TTY::File version: 0.10.0 diff-j.zip

kmuto avatar Nov 17 '21 07:11 kmuto