sharpcompress
sharpcompress copied to clipboard
Encoding with ArchiveFactory.WriteToDirectory
I've ran into an issue where I want to extract a zip archive. Inside the archive, there is a file with german Umlaute: "Übung.txt"
I use ArchiveFactory.WriteToDirectory
to extract the archive, but the extracted file shows some questionmark instead of "Ü". (so, apparently, the encoding is wrong). I am using Windows 10.
The method provides an option-argument, however it does not contain anything encoding-related.
I searched the documentation and came across an example usage of reading the file and extracting it. There, you can provide some reader options:
private static void ExtractArchive(string source, string destination) {
var opts = new ReaderOptions();
var encoding = Encoding.GetEncoding(1252);
opts.ArchiveEncoding = new ArchiveEncoding {
CustomDecoder = (data, _, _) => encoding.GetString(data),
};
using Stream inStream = File.OpenRead(source);
using Stream outStream = File.OpenWrite(destination);
using var reader = ReaderFactory.Open(inStream, opts);
while (reader.MoveToNextEntry())
{
if (!reader.Entry.IsDirectory) {
using var entryStream = reader.OpenEntryStream();
entryStream.CopyTo(outStream);
}
}
}
I tried various different encodings, but none worked.
Am I doing something incorrectly? Or might this be a bug?
Also, I'm wondering what the reason is that ReaderFactory.Open
has an option-argument that provides encoding information, but ArchiveFactory.WriteToDirectory
doesn't?
There could be several things going wrong here. Encoding within the archive and/or encoding at the code level once it makes a string. I'm not the best with encodings so I'm not sure. I'd need a sample file to see more.
WriteToDirectory is an extension method that's just a helper. It's not meant to cover all scenarios.
Thanks for your reply!
I prepared a sample file, I hope that it helps you to track down the issue.
Is there anything more I can do to spot the problem?