adm-zip
adm-zip copied to clipboard
filenames with Unicode characters are corrupt
Similar to another issue, filenames with Unicode characters are unusable: 7-zip can neither read nor extract them.
For example:
Tal/A L'infini/Le Passé.txt -> Tal/A L'infini/Le Pass├⌐.tx
Snøfall.txt -> Sn├╕fall.tx
I have same problem. Have you solved this?
This thing blocks usage of this module. Same issue.
Is it merged to the master? Is there any progress?
Hi - this is a high priority item for us and we will consider moving away from this package for this functionality if there's no movement on it.
Problem: if a file inside a zip has Ñ in the filename, then this character gets turned into � (unicode replacement character) when the file is unzipped by adm-zip.
adm-zip 0.4.14
Setting the entry.header.made to 788 before writing the zip worked for me.
With this the Created OS is set to 03 'Unix' for the depending CENTRAL HEADER.
This workaround might help:
const zip = new AdmZip();
....
zip.addLocalFolder(...);
...
zip.getEntries().forEach(entry => {
(entry.header as any).made = 788;
});
zip.writeZip(destinationPath);
Nice hack, btw, but I am not sure it generates correct ZIP files.
Because adm-zip expects file names to be encoded as utf8 and writes file names encoded as utf8. but forgets to set that bit in flag. So when other apps read zip files created by Adm-zip. They dont understand file names are encoded in utf8 and get garbled results. It is usually not problem when you stick with US alphabetic names.
Maybe you try set this bit in flag instead.
...
zip.getEntries().forEach(entry => {
(entry.header as any).flags |= 0x800; // Set bit 11 - APP Note 4.4.4 Language encoding flag (EFS)
});
...
It should work.
@5saviahv Yeah i saw your linked pr and i tried to get this working, but without any luck. The apps i used to test (unrar, 7zip, winrar etc.) doesn't recognize the changed encoding (if i opened the created zip with theses apps). I double checked before (with zipdetails) that the flag was correct set in local and central header.
Btw. if your pr worked for you why it was never merged?
The only way to get these apps recognize the encoding in the opened apps, was to use the upper hack.
It was indeed only a problem with the german 'umlauts'.
I only get the magic number above from an already created zip file.
Can anyone explain how to compute the value for made for a given option like 03 Unix?
I had issues with PR tests so I took it down. I planned to fix it but so far I haven't done it.
made value consist two parts upper and lower byte: upper byte is for describing system what was used: 03 = Unix, 00 = Dos, 19 = OS X lower byte is for describing minimum version needed to extract this file: 20 = v2.0 from DOS days.
if you look APPNOTE.md (included with repository) you find description in sections 4.4.2 & 4.4.3
If we look your value 788 and view it as hexadecimal value it gives 314, so upper byte is 3 and lower is 14 and when you convert them back into decimal you get 3 and 20.
Ah thx. I missed the last step, separating lower and upper byte and partly convert.
Is the Bit 11 setting working in your case? For me it does not.
It seems that the tested apps uses the made value to guess an encoding, but ignored the Bit 11 setting.
As an example the case 'Ä' was converted in the zip created by adm-zip to '├Д' which seems to be an Codepage 855 encoding (a DOS encoding)...and DOS is the default os setting.
If i created the same zip with another appllication (e.g. libarchive), the encoding in the opened zip was correct in all tested apps.
I see there's movement on this, fingers crossed you guys can fix it. Thank you for your efforts so far!
Good to see you guys are talking about Bit 11 as I was thinking along the same lines (see the comment under this article): https://lwn.net/Articles/729835/#:~:text=There%20are%20no%20specs%20for,the%20box%2C%20but%20for%20ZIP.
I just tested this code, little modified example code and for my surprise
- if Bit 11 is set but mode is not written - GUI tools understand filenames, but command line tools doesn't. .
- if Bit 11 is not set and mode is Unix - command line tools understand file names, but GUI tools doesn't.
- if Bit 11 is set and mode is Unix - Both GUI and command line command line tools understand file names .
#!/usr/bin/env node
const AdmZip = require('adm-zip');
const zip = new AdmZip();
// add file directly
const content = "inner content of the file";
//zip.addFile("äää.txt", Buffer.from(content), "entry comment goes here");
zip.addFile("你好.txt", Buffer.from(content), "entry comment goes here");
zip.getEntries().forEach(entry => {
entry.header.made = 0x314;
entry.header.flags |= 0x800; // Set bit 11 - APP Note 4.4.4 Language encoding flag (EFS)
});
const willSendthis = zip.toBuffer();
zip.writeZip('./test-utf8.zip');
for GUI tools I used only Gnome Archive Manager. I also used google drive for testing but it detected correct names with every try.
I also tried files on windows. I tried Windows Explorer and 7-Zip v19. If Bit 11 is set, both understand file name. If Bit 11 is not set but mode is Unix 7-Zip understands file name when Windows explorer doesn't.
@5saviahv Thanks for your testing effort.
That are confusing results. Setting Bit 11 never fixed the issue for me with Gnome Archive Manager.
Can you specify which are your command line tools? I used unszip -l
In fact the solution would be: setting Bit 11 and Unix mode as default? That should be easy?
I'm unsure about any side effects if changing Bit11 as default. Does anyone need the default encoding? :thinking:
Maybe we should test the last option from the link mentioned my @Robert-Rendell as well?
There are no specs for file name encoding in ZIPs. There's no file name encoding indicator either.
Actually, there is. See appendix D of the ZIP spec at https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT which > says:
- If bit 11 is set, the filename is in UTF-8
- If bit 11 is unset, the filename is in CP437
- The UTF-8 filename can also be in extra record 0x7075
Additional it would be hard to create automated tests for the usage of external tools with different os.
Command line tool I used was Info-ZIP UnZip v6.00 and Gnome Archive Manager is v3.28.0. I realized how old they are first one is from 2009 second one from 2014.
adm-zip actually already reads and writes file names as utf8 by default. Just bit indication in flag is missing. Setting bit 11 for every file is actually totally acceptable. note says it may be not compatible with tools from last century.
@5saviahv we've asked a question on SO for this: How to compute the value for version made by in zip header? so if you have an SO account and don't mind getting an extra rep, please go ahead and I'll award you the bounty. :-)
I tried this snippet but it didn't work for me for the zip file generated by Windows WinZip.
zip.getEntries().forEach(entry => {
entry.header.made = 0x314;
entry.header.flags |= 0x800; // Set bit 11 - APP Note 4.4.4 Language encoding flag (EFS)
});
Problem for me is Windows 10 WinZip (there's known Unicode problem / zip problem with Windows WinZip).

Using 7zip I don't get the same issues, so I'm going to recommend to users to use 7zip and not WinZip when using languages outside of english alphabet.
Did you tried compress this file, with adm-zip ? This snippet above is useful creating new zip files.
I think this was fixed in v0.5.6 by commit 151045270d78b226b79d517ab36848556b1fc19e.
I'm having the same issue with the latest version still, @5saviahv can you please tell me if it was fixed or not?
This is not directly related to the OP, but along the same lines and is more of a heads-up. In a work project I am on, one of our developers used a method from the editors auto-complete call addLocalFolderPromise. I was not able to find that method documented so it may be intended to be a private method or was depricated at some point, not sure. As we were using the non-promisified version, the developer thought it would be an improvement to use the promisified version. This is fine if you are using files with non-unicode characters in the filename. If a unicode character is present, like ® in the filename, the folder will get created and the zipped files that contain unicode characters are getting saved with the names having the characters completely removed, breaking the app that was expecting filenames that were linked to be present in the zip directory. So a file named poster®.png will be zipped as poster.png. That comes from the adm-zip.js file lines 357-360
p = p
.normalize("NFD")
.replace(/[\u0300-\u036f]/g, "")
.replace(/[^\x20-\x7E]/g, ""); // accent fix
If this should be a bug or separate issue filing, let me know and I'll be happy to provide further details and open the correct issue/bug with a minimal replication.
In v0.5.10, addLocalFolder was fixed.
but addLocalFolderPromise still error, unicode characters are removed, like this 第1集.txt becomes 1.txt
In v0.5.10,
addLocalFolderwas fixed.but
addLocalFolderPromisestill error, unicode characters are removed, like this第1集.txtbecomes1.txt
I see the same using zip.writeZip compared to zip.writeZipPromise. When using the promise version files with unicode characters in the name have empty content when decompressed.
It should be fixed for now in repo