SharpZipLib icon indicating copy to clipboard operation
SharpZipLib copied to clipboard

Files and folders with non-english characters names are have gibberish names after unpacking

Open TuTAH1 opened this issue 3 years ago • 2 comments

Steps to reproduce

  1. Pack "слово.txt" by any archivator, ex. Bandizip
  2. See that any programm unpacks it correctly (ex. windows explorer or same program you packed it)
  3. Unpack it via `new FastZip().ExtractZip()1

Expected behavior

files and folders have same name as when they was packed

Actual behavior

files and folders name is like ����� �ணࠬ��

Version of SharpZipLib

1.4.0

Obtained from (only keep the relevant lines)

  • Package installed using NuGet

Tryed actions:

ZipStrings.CodePage = 866;    // No data is available for encoding 866. For information on defining a custom encoding, see the documentation for the Encoding.RegisterProvider method.
ZipStrings.CodePage = 1251;   // same error
ZipStrings.CodePage = 65001;  // gibberish
ZipStrings.UseUnicode = true; // gibberish

new FastZip { 
  EntryFactory = new ZipEntryFactory { 
    IsUnicodeText = true 
  } 
}.ExtractZip(filepath, (TempFolder? "Temp\\" : ""), null);

(instead of just

new FastZip ().ExtractZip(filepath, (TempFolder? "Temp\\" : ""), null);
//not works. 

IsUnicodeText = true gives same result as IsUnicodeText = false

TuTAH1 avatar Sep 21 '22 07:09 TuTAH1

IsUnicodeText = true gives same result as IsUnicodeText = false

This is because it's only used for creating entries, not when reading.

No data is available for encoding 866. For information on defining a custom encoding, see the documentation for the Encoding.RegisterProvider method.

This is because .NET only includes a very limited set of supported encodings. To add support for all the encodings present in .NET Framework, call this:


using System.Text;

// ...

Encoding.RegisterProvider(CodePagesEncodingProvider.Instance);

In fact, if that is called, FastZip should automatically pick that encoding (as your OS is set to it).

piksel avatar Sep 21 '22 09:09 piksel

Actually, I am going to reopen this, because the automatic encoding only works if

Encoding.RegisterProvider(CodePagesEncodingProvider.Instance);

is called before any instance of ZipCodec has been accessed, and only on .NET FW. For .NET Core / 5+ it still only returns UTF-8. This should be fixed in an upcoming release.

piksel avatar Sep 22 '22 07:09 piksel