archive icon indicating copy to clipboard operation
archive copied to clipboard

zipDirectory method is failing to define filenames with accents and special characters with UTF8 encoding

Open insinfo opened this issue 2 years ago • 8 comments

zipDirectory method is failing to define filenames with accents and special characters with UTF8 encoding

 final encoder = ZipFileEncoder();     
 encoder.zipDirectory(Directory(destinationDirectory), filename: zipFileName);
flutter ^3
  archive: ^3.3.1

image

insinfo avatar Aug 25 '22 20:08 insinfo

I tested several versions and it seems that the problem persists in all that I tested

insinfo avatar Aug 26 '22 17:08 insinfo

By default, zip softwares use MBCS (creator's system's codepage) for filename, not utf-8. Java's ZipInputStream can set Charset. Maybe a Charset/Codepage parameter should be added.

axilesoft avatar Sep 10 '22 11:09 axilesoft

@axilesoft when I use WinRAR or WinZip to compress the folder, everything works correctly, that is, the file names are correct when I open the zip/rar, but when I use a dart script with the lib Archive with the zipDirectory method and create the zip file it there is a problem with the filenames.

insinfo avatar Sep 19 '22 19:09 insinfo

Maybe because WinRAR WinZip does not utf-8 for filename by default

axilesoft avatar Sep 22 '22 16:09 axilesoft

I'm experiencing the same issue here:

file name in system: avó.png

Uses winrar to zip it and add the zipped file to my flutter project's assets

Executes the following algorithm:

final byteData = await rootBundle.load('assets/$path'); final Directory appDocDirNewFolder = Directory((await getApplicationDocumentsDirectory()).path); await appDocDirNewFolder.create(recursive: true); final inputStream = InputStream(byteData); final archive = ZipDecoder().decodeBuffer(inputStream); return archive.files

Object archive.files shows name as "av¢.png'"

Any workaround for this issue?

(I'm currently using Dart SDK 2.18.1, Flutter SDK 3.3.2 and archive 3.3.1)

chimura avatar Sep 25 '22 23:09 chimura

Further observations:

I'm using Windows (iso-8859-1 encoding) + Winrar to zip files.

I've tried using package charset_converter in order to properly decode names. It's able to get some of the names right (ações.png) but the issue with "av¢.png" fileName persists.

Next, I tried using 7zip software with -mcu parameter in order to generate a zip file forcing utf8 encoding for names. Same dart algorithm worked well.

The problem is that users won't always try to unzip files that were specifically encoded with utf8. So I was wondering if there's any way the package could be more aware of what encoding was used to properly decode.

I've noticed that, in my case, I would need cp437 + Latin1 to properly decode all the names of my files (latin1 for names like "ações.png" and cp437 for names like "almoço.png").

Maybe InputStreamBase's readString({int? size, bool utf8 = true}) method could use other methods than Utf8Decoder().convert(bytes) + String.fromCharCodes(bytes) to decode fileNames?

chimura avatar Sep 26 '22 17:09 chimura

I can look into this, but work has been very busy so I have been and will be slow.

brendan-duncan avatar Sep 27 '22 15:09 brendan-duncan

Is there any update on this? I experience a problem on OSX that when I add a file to the zip by running the code below, the files' name in the zip that contains accents are damaged.

final encoder = ZipFileEncoder();
encoder.create(...);

await encoder.addFile(
  file,
  'Értesítés.pdf',
);

encoder.close();

And the file in the zip is called then E��rtesi��te��s.pdf

mudlee avatar Dec 05 '23 09:12 mudlee