jszip
jszip copied to clipboard
Add support for ZipCrypto decryption
See the second commit message for explanation.
I think there was some discussion that this would have to be optional. Not sure how you want to control this option exactly, but it would be fairly trivial to add, I think.
Partially addresses #115
I'll back out and redo the dist update, so please ignore that for now. Also need to update tests.
Before I do all of that though, is jszip interested in this patch at all? I'm going to use this for a small project, so I would rather not have to maintain a separate fork.
Reading encrypted zip files sounds good ! I'm ok with not supporting PKWARE strong encryption since the specification says
The Strong Encryption technology defined in this specification is
covered under a pending patent application. The use or implementation
in a product of certain technological aspects set forth in the current
APPNOTE, including those with regard to strong encryption, patching,
or extended tape operations requires a license from PKWARE.
Wikipedia says that this ZipCrypto encryption has known weaknesses so I think that a lot of softwares use AES encryption : if we add one, the other will need to follow (or we will get a lot of bug reports).
Regarding the API, I wonder if checking the password/decrypting the content when decompressing is a good thing. JSZip doesn't decompress files until necessary so a developer can have a hard time predicting where this will be done. For example, if a user read a zip file (with encrypted content but without password callback), add an image and generate the result, the exception will be thrown in the generate()
method.
A (maybe naive and oversimplistic) way to avoid that could be to decrypt content when parsing the zip file. What do you think of it ?
Regarding the type of the compressed data, it depends of the reader :
- StringReader on IE 6-9 (the data will be a binary string)
- NodeBufferReader on nodejs (the data will be a nodejs' Buffer)
- Uint8ArrayReader on modern browsers
You can do the following to get the right type :
var dataType = support.uint8array ? "uint8array" : "array";
var data = utils.transformTo(dataType, encryptedData);
And you don't have to update the dist/*
files, they are updated before each release :-)
I'll look into adding AES support.
I'm looking into our options for delaying decryption as long as possible (until it's actually needed). If I'm reading the code right, the checkCRC32
load option always causes the decompression to happen immediately when the local header is read. Is this intentional? I'm not sure what the purpose of this check is. Would it not make sense to perform the check when a file contents are actually requested?
Yes, with the checkCRC32
option, everything is decompressed. The idea was to be able to check the validity of a zip file (and the crc32 values with this option) with only one try/catch (on load()
). With a lazy crc32 check, the load is faster but the user will need to guard each and every call to JSZip methods with try/catch.
The decompression also has the side effect of checking the decompressed file length. This check is usually lazy (because it depends of the decompression) and can throw errors in the getters or in the generate()
method (but only with corrupted files).
I clearly see the advantages of delaying decryption : speed, the ability to ask the user its password after showing the files list, etc.
What I fear is that it may become harder to use JSZip if we put too many things as lazy. I'm not against documenting load()
/ getters / generate()
as methods throwing exceptions in more situations, I just want to be sure that's the right thing to do :)
@Stuk what's your opinion ?
I guess I understand the point here. I do see that in prepareContent
there is another throw (throw new Error("Bug : uncompressed data size mismatch");
), so basically, any call that would trigger encryption errors, could (in theory) also trigger this error and would have to be handled in a try-catch. Being unable to decrypt is just as severe of a failure as data size mismatch.
I'm not sure what a better alternative to the current implementation would be though. Asking for a password on load (and not validating it?) would not make much sense, since you only need it to decrypt files right before decompression. There are a number of operations (adding, deleting files) that you could do without needing a password. Additionally, not all file sin the archive have to be encrypted, nor do they have to be encrypted using the same algorithm. (actually, they don't even have to use the same password, but I haven't coded that in, since I don't think that's common)
Having thought about this a bit, I would suggest introducing an onError
handler (or something similar). This handler would be passed any errors that are thrown and could either re-throw them (or perhaps just return
and let JSZip re-throw them), or handle them in whatever way (and return true;
so JSZip does not re-throw). All JSZip
methods would then need to be wrapped (internally) in try-catch and call the onError
handler in the catch
. This would probably offer the most flexibility and would not require any try-catching.
I'm working on AES implementation in the mean time.
I have yet another idea, though it feels crude. Could crc32 check just be prevented for the files that are known to be encrypted, but still remain for the other files to check their validity on load()?
I have yet another idea, though it feels crude. Could crc32 check just be prevented for the files that are known to be encrypted, but still remain for the other files to check their validity on load()?
I think that's reasonable. Another reason to ignore encrypted files is that with WinZip's AE-2 encryption, you don't even have a crc32, but instead you are given an HMAC-SHA1-80 hash of the encrypted data.
I'm trying to test AES decryption for which I was going to use node-forge package, but the way it require
s submodules is not really compatible with browserify (I think). I haven't worked with node (edit: and by extensions, browserify) before, so I would appreciate if someone could let me know whether there's a way to make forge work, or if I have to look for a different crypto library (maybe crypto.js).
I made some progress (using crypto-js), but I'm still having issues. I'd like to split this up into two pull requests: one for ZipCrypto now and the other one for AES a little bit later. ZipCrypto is working perfectly now with files generated by 7zip, WinZIP, WinRAR, and ZipInfo's zip (the one in unix/linux).
Any further thought on how to handle interaction with the user in regards to password retrieval and validation? Two things are necessary:
- we need to be able to retrieve the password (I think delaying until required and not crc32-checking encrypted files)
- we need to be able to notify the user that the supplied password is wrong (possibly retrying? I would leave the retrying up to the user though)
I'd like to split this up into two pull requests: one for ZipCrypto now and the other one for AES a little bit later.
Wise decision, especially for future references. Smaller pull requests are easier to understand. Always.
API suggestions:
- Add
.encrypted
property to ZipObject; if notundefined
, that property should mean that the file is encrypted; that property should actually contain encyption's name (string) such as'zipcrypto'
or'aes'
. - Add optional
options
parameter (object) to ZipObject's getters.asText()
,.asBinary()
,.asArrayBuffer()
,.asUint8Array()
,.asNodeBuffer()
so that providing a password becomes possible:.asText({password: 'examplePassword'})
,.asBinary({password: 'examplePassword'})
,.asArrayBuffer({password: 'examplePassword'})
,.asUint8Array({password: 'examplePassword'})
,.asNodeBuffer({password: 'examplePassword'})
- Also make this methods
throw
on CRC errors or password mismatches. Define a couple of constants (such asCRC_ERROR
andPASSWORD_ERROR
for example), and export them, and only throw errors based on them (such asthrow new Error(CRC_ERROR)
for example) because it should be possible to catch the exception later and compare with these constants to determine what happened.
This way “different passwords for different files” are possible. Interaction with the user (password input, retries, etc.) is left to the application over JSZip.
Add .encrypted property to ZipObject; if not undefined, that property should mean that the file is encrypted; that property should actually contain encyption's name (string) such as 'zipcrypto' or 'aes'
Agreed. (That's my current approach actually)
Add optional options parameter to ZipObject's getters .asText(), .asBinary(), .asArrayBuffer(), .asUint8Array(), .asNodeBuffer() so that providing a password becomes possible: .asText({password: 'examplePassword'}), .asBinary({password: 'examplePassword'}), .asArrayBuffer({password: 'examplePassword'}), .asUint8Array({password: 'examplePassword'}), .asNodeBuffer({password: 'examplePassword'})
This... idk. Seems like a huge hassle for the user to do if(zipObj.encrypted) // ask for password first
all the time. I rather like the callback idea (it's only called once per zip file, unless the password is wrong). Do you have something against this that I am overlooking?
Also make this methods throw on CRC errors or password mismatches. Define a couple of constants (such as CRC_ERROR and PASSWORD_ERROR for example), and export them, and only throw errors based on them (such as throw new Error(CRC_ERROR) for example) because it should be possible to catch the exception later and compare with these constants to determine what happened.
I actually added some new error classes that extend Error
and am throwing those, so you can do checks with instanceof
. But we can go the constant string route as well.
Classes extending Error
are even better than string constants, thanks.
Seems like a huge hassle for the user to do
if(zipObj.encrypted) // ask for password first
all the time. I rather like the callback idea (it's only called once per zip file, unless the password is wrong). Do you have something against this that I am overlooking?
A callback is a better idea than if(zipObj.encrypted)
but only if you have good answers for at least the following couple of questions:
- When is that callback called? (When an encrypted file is detected? When a
ZipObject
of that file appears? When a getter, such as.asText()
or.asBinary()
, is called? Or maybe we'd have an API method to choose between these three?) - What is that's callback's signature (i.e. its set of parameters)? Can it be asynchronous (e.g. when such callback renders a GUI dialog prompt asking for password and then calls some further callback — any unzipping has to be postponed until that)?
A zipObj.encrypted
and extended errors sound good !
For the onError
(5 days ago, I'm a bit late) : it may be too weird to have a synchonous method returns a null
of undefined
while giving the error to an other function.
I'm working on asynchonous methods (asTextAsync(callback)
, generateAsync(options, callback)
, etc. see #121) so if we use a sync callback to get the password, we could also use async callbacks with these methods (it makes sense when asking for a password). If having sync and async callbacks is too awkward, I'm not against restricting this feature (ZipCrypto) to only one type of functions and throwing an error in the others.
If we don't use a callback we should add a flag allowEncrypted
(default = false) to keep backward compatibility (we can use the callback itself if we use it) and throw an exception if the user's code isn't ready for encrypted zip files.
.asText({password: 'examplePassword'})
There is one use case covered by a callback and not by this method :
- load a encrypted zip file
- add files
- generate a zip file
The last part will try to decrypt the files and fail.
Hi, any updates to this? Password encryption for zip files would really be a great feature/option.
I second this, would be great if we can protect zip files.
Ya, this would be awesome. If it helps, I've started https://github.com/cryptocoinjs/aes The problem is, that I need to add aes-cbc
stream ciphering though, which is presumably what zip files use for encryption?
@jprichardson Zip uses AES-CTR (128, 192 or 256 bit).
http://www.winzip.com/aes_info.htm
any progress?
I add some Aes decryption code, please see at #696