Problem reading archives containg Zip64 files
Steps to reproduce
- Please see the test located at https://github.com/MatthewSteeples/SharpZipLib/commit/c76de92b5f250f37d292112cb26639afc53d0d9b
Expected behavior
File should extract normally and read 1 byte from each file (we're experiencing this problem even when reading to the end of the Stream, this is just for illustrative purposes)
Actual behavior
When seeking to the end of file 2 (the large file) the following exception is thrown
ICSharpCode.SharpZipLib.Zip.ZipException : Data descriptor signature not found
Version of SharpZipLib
1.3.3 but also verified against master
Obtained from (only keep the relevant lines)
- Compiled from source, commit: ff64d0a
- Package installed using NuGet (1.3.3)
I'm afraid I can't spot anything obvious about what it might be. 7Zip happily opens the generated file and marks the 2 small files as version 20, with the large file being a version 45 and having a Zip64 descriptor (in Characteristics)
Hope that's enough information, but please let me know if there's anything else I can provide
Please note that this test will spit out 50mb tmp files that you'll need to clean up afterwards
Could you upload the generated zip file to https://archivediag.azurewebsites.net? I could generate it from the test as well, but I don't think I have the time to do it for a while, but with the report I can take a look. Unfortunately, it still says that the blob could not be found instead of "waiting for azure function to pick up the job". It shouldn't take more than 5 min though.
Hi @piksel,
It won't upload to there as the file is too large (50mb)
Failed to load resource: the server responded with a status of 413 (Request Entity Too Large)
Aight. I'll take a look.
Well, here is the report: https://pub.p1k.se/sharpziplib/archivediag/issue-698.zip.html
The local header for the large entry has bit 4 (the Descriptor bit) set, which means that the actual size and CRC will follow after the compressed data. But there is no such descriptor following it. Instead, the sizes and CRC are only written to the "Central Header" (which is like a look-up directory for the file in the archive). This means that the zip file is corrrupt (or rather, out of spec) and cannot be read in a streaming matter. If it is accessed in a random-access way instead, it's technically possible to read it (which is why 7z for example can read it, since it only works with random-access files, not streams).
Actually, 7zip does show the file as having an error:
and running "Test" fails.
I'm not sure exactly what System.IO.Compression.ZipArchive does here, but it seems like a bug in their end. But in any case, if you use ICSharpCode.SharpZipLib.Zip.ZipFile instead of ZipInputStream it will use the central headers instead of the local ones (and it managed to extract the file perfectly fine when testing just now).
I altered your test code to use ZipOutputStream to generate the zip file, and it actually compressed it better (~25MiB vs ~50MiB), but slower (we are fully managed after all).
Now, the resulting file was actually also not possible to read using ZipInputStream (the last test), so there might be some bug here in any case...
Here is the report for that file, which shows the descriptor sections that are missing from the ZipArchive version of the file: https://pub.p1k.se/sharpziplib/archivediag/issue-698-expected.zip.html
@piksel Thanks for taking the time to have a look. I can't get 7zip to show me the same screen that you've got there. The file has a CRC, none of the files (large or small) have local in the characteristics, and running "Test" in 7zip reports that there are no errors. If I can reproduce what you're seeing then I'll happily take it to Microsoft. Are you sure the file had flushed by the time you're loading it?
@MatthewSteeples Hello, did you fix this problem ? I'm having the same issue and I can't found the problem. I think that the problem is with the file size I'm trying to compress...
@geracosta Could you upload a file that shows the problem to https://archivediag.piksel.se/ ?