7-Zip-JBinding-4Android Chinese display with messy code

When I extract Files. String path = (String) inArchive.getProperty(index,PropID.PATH) if return Chinese will garbled. How can I fix it? Looking forward to your reply. Thanks

Sep 15 '21 07:09 ZHJ-30

Tested this as follows:

I created a 7z-archive using 7-Zip for Windows, that included both folder name and file name with kanji characters. I used Japanese for this particular test, but 7z uses Unicode UTF-8 for such file names, so this should work similarly for Chinese characters too.

Next I extracted this using 7-Zip-JBinding-4Android and was able confirm the correct folder name (not garbled) using IInArchive.getStringProperty(index, PropID.PATH).

I suspect the archive with Chinese folder names that you are testing with is compressed using a different code page / locale. See similar discussion here: https://sourceforge.net/p/p7zip/discussion/383044/thread/3d213124/

Can you extract the archive correctly using the 7-Zip command line tool?

Sep 20 '21 02:09 omicronapps

Thanks your answer. My extract the archive code is here. I'm not sure if there's a problem.

Sep 30 '21 07:09 ZHJ-30

I don't think there's an issue with the source code or library.

Rather I think this is caused by the file archive that's being extracted.

Can you provide an example archive file that results in garbled characters? For example, by adding this file to a GitHub project, etc.

Oct 02 '21 16:10 omicronapps

hello

I have the same problem now. Using some versions of 7Z compressed Chinese files to obtain file names will display garbled characters. The problem file is attached.

I tried to get the codePage and other information to deal with the garbled code problem by myself, but the values I got were all null. Is there any way to get information about the compressed filename character set? And how to determine if the library has successfully read a file's character set information.

I hope you can help analyze the reason. Looking forward to your reply. Thank you very much.

zip压缩包c7z.zip

Oct 23 '21 10:10 EvilThunder

If the file names are not encoded with UTF-16, then you will need to manually convert the file names to the correct character set.

For example, as follows converting to "GBK" code page:

String path = IInArchive.getStringProperty(i, PropID.PATH);
byte[] ba = path.getBytes();
ByteBuffer bb = ByteBuffer.wrap(ba);
Charset cs = Charset.forName("GBK");
CharsetDecoder cd = cs.newDecoder();
CharBuffer cb = cd.decode(bb);
String gbk_path = cb.toString();

If there is no information about the code page in the archive, then this was not included when the archive was created. In this case, you will need to manually provide this information and ensure that the file names are decoded correctly.

Oct 25 '21 01:10 omicronapps

Thanks your answer.

Is there any way to determine whether library has got the information of code Page? I want to convert it to the code page of "GBK" if library has not

Oct 25 '21 01:10 EvilThunder

The 7-Zip-JBinding library will not make any code page conversions. The library will provide all strings from the archive unmodified. That is the strings will be in the same format (code page) as when the archive was originally created.

If the archive includes information in the PropID.CODE_PAGE property, then you can use this information. But if this property does not exist in the archive, then you must know what code page that was used during compression.

I would recommend using UTF-16 when creating new archive files to avoid issues like this. If this is not possible then the application using 7-Zip-JBinding must have information (for example from the user) of the code page of the archive.

Oct 27 '21 03:10 omicronapps

If the file names are not encoded with UTF-16, then I will need to manually convert the file names to the correct character set. But how do I determine whether file names are encoded with UTF-16？

Does the library have access to other attributes related to the character set, such as those mentioned in the "APPENDIX D-Language Encoding (EFS)" section in the link？

https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT

Oct 27 '21 12:10 EvilThunder

If the file names are not encoded with UTF-16, then I will need to manually convert the file names to the correct character set.

Yes, correct.

But how do I determine whether file names are encoded with UTF-16？

They only way I'm aware of if is the CODE_PAGE property is set, but it looks like this is not used for Zip archives.

Does the library have access to other attributes related to the character set, such as those mentioned in the "APPENDIX D-Language Encoding (EFS)" section in the link？

No, probably not. It looks to me like 7-Zip only handles the following extra IDs for Zip archives.

ZipHeader.h:

  namespace NExtraID
  {
    enum
    {
      kZip64 = 0x01,
      kNTFS = 0x0A,
      kStrongEncrypt = 0x17,
      kUnixTime = 0x5455,
      kIzUnicodeComment = 0x6375,
      kIzUnicodeName = 0x7075,
      kWzAES = 0x9901
    };
  }

7-Zip-JBinding uses 7-Zip version 16.02.

I checked the latest version 7-Zip version 21.02, but it looks like 0x0008 (PFS) is still not supported for Zip archives: https://sourceforge.net/projects/sevenzip/files/7-Zip/21.02/

You may want to check here about 7-Zip support for Zip archives: https://sourceforge.net/p/sevenzip/support-requests/

Oct 29 '21 03:10 omicronapps

Tested this as follows:

I created a 7z-archive using 7-Zip for Windows, that included both folder name and file name with kanji characters. I used Japanese for this particular test, but 7z uses Unicode UTF-8 for such file names, so this should work similarly for Chinese characters too.

Next I extracted this using 7-Zip-JBinding-4Android and was able confirm the correct folder name (not garbled) using IInArchive.getStringProperty(index, PropID.PATH).

I suspect the archive with Chinese folder names that you are testing with is compressed using a different code page / locale. See similar discussion here: https://sourceforge.net/p/p7zip/discussion/383044/thread/3d213124/

Can you extract the archive correctly using the 7-Zip command line tool?

I'm also facing this issue with Chinese file name, I compressed that file using macOS default compression. On extracting this file on Android using this library, I see garbled file names. Any workaround? @omicronapps

May 18 '22 08:05 asthagarg2428

When I extract Files. String path = (String) inArchive.getProperty(index,PropID.PATH) if return Chinese will garbled. How can I fix it? Looking forward to your reply. Thanks

Were you able to solve this issue?

May 18 '22 15:05 asthagarg2428

If the file names are not encoded with UTF-16, then I will need to manually convert the file names to the correct character set. But how do I determine whether file names are encoded with UTF-16？

Does the library have access to other attributes related to the character set, such as those mentioned in the "APPENDIX D-Language Encoding (EFS)" section in the link？

https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT

How did you handle the case then? I'm also stuck

May 18 '22 15:05 asthagarg2428

I'm also facing this issue with Chinese file name, I compressed that file using macOS default compression. On extracting this file on Android using this library, I see garbled file names. Any workaround? Were you able to solve this issue? How did you handle the case then? I'm also stuck

The 7-Zip-JBinding library will not make any code page conversions. The library will provide all strings from the archive unmodified. That is the strings will be in the same format (code page) as when the archive was originally created.

If the file names are not encoded with UTF-16, then you will need to manually convert the file names to the correct character set.

7-Zip does not handle the Language Encoding flag (EFS). This means that it's not possible for 7-Zip to determine the code page.

I would recommend using UTF-16 when creating new archive files to avoid issues like this. If this is not possible then the application using 7-Zip-JBinding must have information (for example from the user) of the code page of the archive.

Additional information in previous replies above.

Jun 02 '22 04:06 omicronapps

I'm also facing this issue with Chinese file name, I compressed that file using macOS default compression. On extracting this file on Android using this library, I see garbled file names. Any workaround? Were you able to solve this issue? How did you handle the case then? I'm also stuck

The 7-Zip-JBinding library will not make any code page conversions. The library will provide all strings from the archive unmodified. That is the strings will be in the same format (code page) as when the archive was originally created.

If the file names are not encoded with UTF-16, then you will need to manually convert the file names to the correct character set.

7-Zip does not handle the Language Encoding flag (EFS). This means that it's not possible for 7-Zip to determine the code page.

I would recommend using UTF-16 when creating new archive files to avoid issues like this. If this is not possible then the application using 7-Zip-JBinding must have information (for example from the user) of the code page of the archive.

Additional information in previous replies above.

But it seems that different languages are supported by this library.

I renamed a text file to a Chinese name and compressed using 3rd party App- Keka - I was ABLE to extract using 7zip-jbining-4android
For the same text file I compressed using MacOS default compression - I was UNABLE to extract using 7zip-jbining-4android

I'm unable to understand the difference between the two and how to solve it

Jun 07 '22 12:06 asthagarg2428

7-Zip supports UTF-8 and UTF-16-LE character encoding. Mac/OSX on other hand uses GBK for Chinese characters. It appears that Keka uses UTF-8, which is why this works with 7-Zip.

I would recommend adding a user dialog to manually select between GBK and UTF character encodings when selecting a file for extraction. I'm afraid 7-Zip does not include support for detecting the EFS.

Jun 13 '22 00:06 omicronapps

7-Zip-JBinding-4Android 7-Zip-JBinding-4Android copied to clipboard

Chinese display with messy code

7-Zip-JBinding-4Android
7-Zip-JBinding-4Android copied to clipboard