libzip
libzip copied to clipboard
"Zip archive inconsistent" error, although supported by other unpackers
Describe the Bug
Trying to open a specific zip file with zip_open_from_source
results in error "Zip archive inconsistent".
I'm using ZIP_RDONLY
as the only flag, i.e. ZIP_CHECKCONS
is not set.
The Test.zip is attached to this bug report.
Opening this archive works with all the usual desktop tools like 7zip, Windows built in unzip, Gnome file roller.
The unzip -t
command on Linux does complain about this file though
EF block length (12374 bytes) exceeds remaining EF data (16 bytes)
I'm not an expert on the Zip file format and don't know if it is really corrupt, but given that most prominent tools
are able to handle this archive I'm wondering if there is a way to read it gracefully with libzip
too.
Personal dilemma: This file is hosted by a hardware device, so repacking or fixing the file is not in my scope :/.
libzip Version libzip 1.7.3 from conan package manager
Operating System Windows 10, Ubuntu 20.10
Test Files Offending test file Test.zip
As libzip and unzip say, this is a bug in the zip archive. One extra field for the first entry claims that it contains 12374 bytes, but the whole area reserved for extra fields for this entry is 20 bytes. You should fix it the file.
If this is not an option, but you can change the source code reading the zip archive, you could load the file into a memory buffer and overwrite the wrong extra field data there. Example code for that is in https://github.com/nih-at/libzip/blob/master/examples/in-memory.c
Ok I think I got a better understanding of this now. I guess my real question was: There seem to be zip files out there in the wild with malformed extra fields (found other issues like this in other projects too). Other tools seem to be able to recover from this, assumingly by truncating or ignoring the extra field data. Couldn't the same be done with libzip? Perhaps with an opt-in zip_open flag for fault tolerant parsing? Or a flag that just skips EF data altogether if you're not interested in it?
Fixing such files in memory would be possible, but to do this you'd have to implement a zip structure parser from scratch, right?
I suggested fixing the zip archive only as a workaround for your immediate problem, I don't think it's a good solution in general. You would need a ZIP parser for that, yes.
libzip uses some extra fields for basic features like zip64 or UTF-8 support. When ignoring extra fields completely, that would suffer.
I'm not convinced that we should support incorrectly created ZIP archives.
Regarding how frequent this ZIP_ER_INCONS error could be, I have some data collected over the last few months, with the number of data points in the order of millions. Error percentages are fairly stable over time.
ZIP_ER_INCONS occurs for 0.5% of all the ZIP files. This is a significant proportion. But it is less than ZIP_ER_NOZIP, which occurs for 2.2% of all the ZIP files.
@pwuertz I know this issue is a well over a year old, but do you remember where the Test.zip
file originated or anything about what it is used for? Just trying to understand more about the misuse of the extra field.
The non-standard extra field looks deliberate because I see the identical invalid extra fields in both the local & central headers records in the zip file. That seems too much of a coincidence to mark down as corruption.
See the two ERROR
lines below
0000 0004 50 4B 03 04 LOCAL HEADER #1 04034B50
0004 0001 14 Extract Zip Spec 14 '2.0'
0005 0001 00 Extract OS 00 'MS-DOS'
0006 0002 00 00 General Purpose Flag 0000
[Bits 1-2] 0 'Normal Compression'
0008 0002 08 00 Compression Method 0008 'Deflated'
000A 0004 65 99 FA 4E Last Mod Time 4EFA9965 'Fri Jul 26 19:11:10 2019'
000E 0004 24 41 BA 99 CRC 99BA4124
0012 0004 3A CF 00 00 Compressed Length 0000CF3A
0016 0004 CC 1F 0A 00 Uncompressed Length 000A1FCC
001A 0002 27 00 Filename Length 0027
001C 0002 14 00 Extra Length 0014
001E 0027 42 61 73 6C Filename 'Basler_Ace_USB_99ba4124_Version_1_0.
65 72 5F 41 xml'
63 65 5F 55
53 42 5F 39
39 62 61 34
31 32 34 5F
56 65 72 73
69 6F 6E 5F
31 5F 30 2E
78 6D 6C
0045 0002 47 43 Extra ID #0001 4347
0047 0002 56 30 Length 3056
# ERROR: 'Length' field @ 0x47 in 'Extra ID' 0x4347 () invalid: value 0x3056 > 0x10 bytes remaining
0049 0010 01 00 01 00 Extra Payload ................
00 00 00 00
01 00 00 00
00 00 00 00
0059 CF3A ... PAYLOAD
CF93 0004 50 4B 01 02 CENTRAL HEADER #1 02014B50
CF97 0001 14 Created Zip Spec 14 '2.0'
CF98 0001 03 Created OS 03 'Unix'
CF99 0001 14 Extract Zip Spec 14 '2.0'
CF9A 0001 00 Extract OS 00 'MS-DOS'
CF9B 0002 00 00 General Purpose Flag 0000
[Bits 1-2] 0 'Normal Compression'
CF9D 0002 08 00 Compression Method 0008 'Deflated'
CF9F 0004 65 99 FA 4E Last Mod Time 4EFA9965 'Fri Jul 26 19:11:10 2019'
CFA3 0004 24 41 BA 99 CRC 99BA4124
CFA7 0004 3A CF 00 00 Compressed Length 0000CF3A
CFAB 0004 CC 1F 0A 00 Uncompressed Length 000A1FCC
CFAF 0002 27 00 Filename Length 0027
CFB1 0002 14 00 Extra Length 0014
CFB3 0002 00 00 Comment Length 0000
CFB5 0002 00 00 Disk Start 0000
CFB7 0002 00 00 Int File Attributes 0000
[Bit 0] 0 'Binary Data'
CFB9 0004 00 00 00 00 Ext File Attributes 00000000
CFBD 0004 00 00 00 00 Local Header Offset 00000000
CFC1 0027 42 61 73 6C Filename 'Basler_Ace_USB_99ba4124_Version_1_0.
65 72 5F 41 xml'
63 65 5F 55
53 42 5F 39
39 62 61 34
31 32 34 5F
56 65 72 73
69 6F 6E 5F
31 5F 30 2E
78 6D 6C
CFE8 0002 47 43 Extra ID #0001 4347
CFEA 0002 56 30 Length 3056
# ERROR: 'Length' field @ 0xCFEA in 'Extra ID' 0x4347 () invalid: value 0x3056 > 0x10 bytes remaining
CFEC 0010 01 00 01 00 Extra Payload ................
00 00 00 00
01 00 00 00
00 00 00 00
CFFC 0004 50 4B 05 06 END CENTRAL HEADER 06054B50
D000 0002 00 00 Number of this disk 0000
D002 0002 00 00 Central Dir Disk no 0000
D004 0002 01 00 Entries in this disk 0001
D006 0002 01 00 Total Entries 0001
D008 0004 69 00 00 00 Size of Central Dir 00000069
D00C 0004 93 CF 00 00 Offset to Central Dir 0000CF93
D010 0002 0E 00 Comment Length 000E
D012 000E 00 00 00 00 Comment ' '
00 00 00 00
00 00 00 00
00 00
Error Count: 2
Done
@pmqs
.. do you remember where the Test.zip file originated or anything about what it is used for?
Yea sure. See that Basler_Ace_USB_99ba4124_Version_1_0.xml
file in there? It's a GenICam device descriptor from a Basler ace series industrial camera.
looks deliberate because I see the identical invalid extra fields in both the local & central headers records in the zip file. That seems too much of a coincidence to mark down as corruption.
True, not "corruption" in the sense of random errors affecting the transfer or storage of data. But most probably a fault in the program that was used to create the zip file, i.e. an algorithm that deterministically creates invalid or "corrupt" archives.
Thanks @pwuertz
Interesting to note that the two byte extra ID used just happen to be ASCII "GC" -- that matches well with "GenCam".
May be a deliberate non-standard use of the extra field that breaks the zip spec or just vestigial data that ended up getting released.