zip4j
zip4j copied to clipboard
Zip multi-volume archive data sees no files
I got a zip file from an user (context) which file on Linux describes as: "Zip multi-volume archive data, at least PKZIP v2.50 to extract".
Using zipInputStream.getNextEntry() on the ZipInputStream created of that file instantly returns null, even without reading a single file.
Sadly I cannot provide the file as it contains private data nor can I provide an example file as I don't know how to create such a file, but I was hoping creating an issue was better than nothing (hopefully someone else can provide a test case file!)
FWIW, I made a test case: foo.zip.
The issue was that for some unknown reason there's a spanned archive marker (0x08074b50, little endian) at the start of these ZIP files, right before the first local file header (0x04034b50), which results in iterating over the file using ZipInputStream.getNextEntry() failing as the first call immediately returns null.
I would not really consider that a bug given the limitations of using ZipInputStream, but detecting such a marker and skipping it would be fairly easy and a nice feature to have for these cases (arbitrary data in between entries is of course allowed by the ZIP format and can't really be handled when treating the file as a stream, you have to read the central directory for that so you can jump to the local header offsets correctly).
Thus, iterating over the entries using ZipFile.getFileHeaders() instead works fine as expected.
FWIW it's also possible to use this workaround to manually skip the marker (in case you really need to use ZipInputStream instead of ZipFile):
InputStream input = new BufferedInputStream(...);
byte[] buf = new byte[4];
input.mark(4);
for (int i = 0; i < 4; ++i) {
if (input.read(buf, i, 1) != 1) {
throw new IOException("File is less than 4 bytes.");
}
}
if (new BigInteger(1, buf).intValue() != 0x504b0708) {
input.reset();
}
ZipInputStream zipInputStream = new ZipInputStream(input, ...);
...