DotNetZip.Semverd
DotNetZip.Semverd copied to clipboard
Plans to fix performance issues when zipping large directories with strong encryption using the same password?
I did some performance testing, and it seems like the performance when using aes256 to zip a large folder is pretty slow (took ~45 seconds to zip an 86.9MB folder with 2282 files as opposed to using PkzipWeak where the same folder only took ~7 seconds to zip). Looking at the code it seems that it is calculating the encryption key for each file (in ZipEntry.WriteSecurityMetadata):
but since we are using the same password for the entire folder can't we calculate it once in the ZipFile object and then each time we add a ZipEntry we dont need to call WinZipAesCrypto.Generate (like we do in the image above), and instead can just add the encryption key information to the header?
The reason for the "performance hit" is due to the fact that from a security perspective an independent random salt needs to be computed for each file that is being encrypted and so while the password is the same, the salt must be different. See more here: WinZip Encryption Salt
@AndrewBSzabo - In looking through the WinZipAesCrypto
source file, there are definitely some areas where the performance could still be really improved, even without the change to reuse the generated key. With that in mind, I am going to reopen this issue and mark it as up for grabs. If you would like to work on it, you are more than welcome to do that. One easy win would be just modifying the Generate
function to not create a new instance of Random
every time it is called. The class itself could just have one static instance and reuse it (and just as a side note, it might be a big win to look through the whole source code and see if that could be pulled out to use a thread safe static Random
as well. Bear in mind that Random
is not thread safe, so you would want to look at implementing something similar to this SO answer. I think that change alone would boost the performance in the WinZipAesCrypto.Generate
function. I believe there may be some other gains that can be made in that class as well.
Sounds good. I will start looking into this!