SharpZipLib ZipEntry does not support special characters by default since 1.1.0

Steps to reproduce

Try to use the following code to zip a csv file with a special character ("é") in its name:

using (var zos = new ZipOutputStream(File.Create(zipPath)))
{
    ...
    zos.PutNextEntry(new ZipEntry("Téb.csv"));
    ...
}

The default value of UseUnicode changed to false since 1.1.0 when the AutomaticCodePage property was introduced in ZipEntry.cs. Now by default codePage is -1, so UseUnicode will get a false value (as its getter uses the codePage field instead of the CodePage property). There are workarounds (setting UseUnicode by hand or filling the IsUnicodeText property for every ZipEntry), I just wanted to let you know this issue.

Expected behavior

The zip should contain a "Téb.csv" entry.

Actual behavior

The zip contains a "TTb.csv"

Version of SharpZipLib

1.3.0

Obtained from (only keep the relevant lines)

Package installed using NuGet

Mar 05 '21 15:03 dlengyel-mirango

What you mention is not a workaround. If you are creating the zip file, you can decide if entires should use unicode or not for their names. When you are reading a zip file, that decision has already been made, and ZipStrings is set set up to detect and adapt to that. Now, the default should probably be IsUnicodeText for new entries, or even better, ZipStrings should not be a static shared singleton between ZipInput- and ZipOutputStream, but rather something that could have different defaults and separately overridden.

Mar 05 '21 19:03 piksel

@piksel I understand that, but the behavior changed in 1.1.0 and it seemed to me that it was not intentional (see my comment about AutomaticCodePage). Earlier UTF-8 was the default during the creation of a zip file and I think this has changed. If not nothing is set by hand, UseUnicode will return false. So basically our code that was written for creating zip files has broken when we upgraded to 1.1.0+ and now we have to fill in IsUnicodeText at every occurrence (as using the singleton is not option).

Mar 08 '21 06:03 dlengyel-mirango

Yeah, it was changed because it was broken in the opposite way before, where archives without unicode encoding could not be read, because it treated everything as UTF-8. The way it is right now, at least it's toggle:able by the consumer, even if it's not ideal. In #592 the defaults for creating archives are now back to unicode, and using another encoding is an opt-in.

Mar 08 '21 07:03 piksel

@piksel wow, thanks for the quick response, got it. The goal of #592 seems to be quite nice-looking 👍

Mar 08 '21 07:03 dlengyel-mirango