archive icon indicating copy to clipboard operation
archive copied to clipboard

Decompressed error, Bad UTF-8 encoding on Chinese

Open Tokenyet opened this issue 5 years ago • 4 comments

Bug on 2.0.10 with Chinese folder, the example file is attached. The error catched:

 Bad UTF-8 encoding

The file is small, so I attach It here. rawMap.zip

Tokenyet avatar Oct 05 '19 19:10 Tokenyet

I'm pretty sure It's the problem of Chinese decoding, because I deleted all the Chinese file and renamed the Chinese folder, and then run again, everything is working as expected.

Tokenyet avatar Oct 05 '19 19:10 Tokenyet

Your zip file looks like GBK or GB2312 encoding. As far as I know, 'archive' only support UTF-8. Use UTF-8 instead. Maybe 'archive' could guess encoding like tool 'unar', or provide encoding choice like 'unzip-iconv': unzip -O gbk test.zip? @brendan-duncan

javanli avatar Nov 17 '19 15:11 javanli

@javanli, Just for clearification, Big5 is used in Traditional Chinese (zip Attachment), and Both GBK and GB2312 are for Simplified Chinese. In additional, It's hard to let user follow a rule strictly, hope there is a solution or workaround to slove this issuse instead of prohibition 😞

Tokenyet avatar Nov 17 '19 16:11 Tokenyet

@Tokenyet My way is rewrite InputStream.readString, try different codecs to decode the buffer.

javanli avatar Nov 17 '19 16:11 javanli