Remove constraint that zip64 field values be greater than 0
Fixes #318
Changes proposed in this PR
In ZIP64 archives, certain standard values (like compressed and uncompressed file sizes, and offsets within the archive) are set to 0xFFFFFFFF, and actual values are provided in the ZIP64 extra fields. The code was requiring these values be > 0. If a value was 0, the non-ZIP64 value was returned (0xFFFFFFFF), and thus almost guaranteed to be incorrect.
This patch removes that test for compressedSize, uncompressedSize, and relativeOffsetOfLocalHeader. While it’s unlikely for a file to have zero length, it’s not impossible, and it’s very likely for a local header to be at offset 0.
Tests performed
I tested this by seeing if I could iterate over the entries of the file in the referenced bug. Before, the iterator would immediately return nothing, now it returns the expected three entries. I didn’t want to add the referenced file to the repo, and wasn’t sure of the best way to add this test.
Further info for the reviewer
None.
Open Issues
None.
Thanks for looking into this issue.
There seem to be archives that require those size > 0 safe guards though (e.g. testExtractCompressedZIP64Entries fails for me). I haven't looked into the details, but simply removing those constraints won't work.
There seem to be archives that require those
size > 0safe guards though (e.g.testExtractCompressedZIP64Entriesfails for me). I haven't looked into the details, but simply removing those constraints won't work.
How do you create testExtractCompressedZIP64Entries.zip? It doesn't look like a valid ZIP64 file. macOS can unzip it, but testing the archive reveals issues:
% unzip -vT testExtractCompressedZIP64Entries.zip
note: didn't find end-of-central-dir signature at end of central dir.
Looking at it in HexEdit, I see the EOCD (PKWare App Note section 4.3.16):
| Field | Value | Notes |
|---|---|---|
| Signature | 50 4B 05 06 | Correct |
| Number of this disk | 00 00 | Should be 0xFFFF for ZIP64 |
| Number of disk with start of CD | 00 00 | Should be 0xFFFF for ZIP64 |
| Total number of entries in CD on this disk | 01 00 | Should be 0xFFFF for ZIP64 |
| Total number of entries in CD | 01 00 | Should be 0xFFFF for ZIP64 |
| Size of CD | 77 00 00 00 | Should be 0xFFFFFFFF for ZIP64 |
| Offset of CD | FF FF FF FF | Correct |
| Comment length | 00 00 | Correct |
The Zip64 end of central directory locator (4.3.15) appears just before this:
| Field | Value | Notes |
|---|---|---|
| Signature | 50 4B 06 07 | Correct |
| Disk with start of EOCD64 | 00 00 00 00 | Correct |
| Start of EOCD64 | DA 01 00 00 00 00 00 00 (474) | Correct |
| Total number of disks | 01 00 00 00 | Correct |
The EOCDR64 (4.3.14) is:
| Field | Value | Notes |
|---|---|---|
| Signature | 50 4B 06 06 | Correct |
| Size of EOCDR64 | 2C 00 00 00 00 00 00 00 (44) | Correct |
| Version made by | 1E 03 | |
| Version required | 2D 00 | |
| Number of this disk | 00 00 00 00 | Correct |
| Number of disk with start of CD | 00 00 00 00 | Correct |
| Total number of entries this disk | 01 00 00 00 00 00 00 00 | Correct |
| Total number of entries | 01 00 00 00 00 00 00 00 | Correct |
| Size of the CD | 77 00 00 00 00 00 00 00 (119) | Correct |
| Offset to start of CD | 63 01 00 00 00 00 00 00 (355) | Correct |
| Extensible data | None |
The CD (4.3.12) is:
| Field | Value | Notes |
|---|---|---|
| Signature | 50 4B 01 02 | |
| Version made by | 1E 03 | |
| Version required | 2D 00 | |
| Flags | 00 00 | |
| Compression Method | 08 00 | |
| Last mod time | 10 75 | |
| Last mod date | A6 52 | |
| CRC-32 | B7 9B 29 DC | |
| Compressed size | F0 00 00 00 (240) | Should be 0xFFFFFFFF for ZIP64 |
| Uncompressed size | FF FF FF FF | Correct |
| File name length | 25 00 (37) | |
| Extra field length | 24 00 (36) | |
| File comment length | 00 00 | |
| Disk number start | 00 00 | |
| Internal file attrs | 00 00 | |
| External file attrs | 00 00 A4 81 | |
| Local header rel offset | 00 00 00 00 | |
| Filename | 74 65 73 74 45 78 74 72 61 63 74 43 6F 6D 70 72 65 73 73 65 64 5A 49 50 36 34 45 6E 74 72 69 65 73 2E 70 6E 67 | |
| Extra field | 55 54 05 00 03 4F 81 93 60 75 78 0B 00 01 04 F6 01 00 00 04 14 00 00 00 01 00 08 00 D0 03 00 00 00 00 00 00 | |
| File comment | None |
The extra field breaks down as:
| Field | Value | Notes |
|---|---|---|
| Header ID | 55 54 | Extended timestamp |
| Data Size | 05 00 (5) | |
| Data 1 | 03 4F 81 93 60 | |
| Header ID 2 | 75 78 | Unix UID/GID info |
| Data size 2 | 0B 00 (11) | |
| Data 2 | 01 04 F6 01 00 00 04 14 00 00 00 | |
| Header ID 3 | 01 00 | Zip64 extra info |
| Data Size 3 | 08 00 (8) | |
| Uncompressed size | D0 03 00 00 00 00 00 00 (976) | Correct. But there should be more fields for this extra field (compressed size, offset, disk number) |
I'm going to use this information to see if I can't refine how my patch gets the values.
How do you create testExtractCompressedZIP64Entries.zip? It doesn't look like a valid ZIP64 file. macOS can unzip it, but testing the archive reveals issues:
The ZIP64 support was provided by @Ckitakishi . It seems that the sheer existence of ZIP64 extended information is enough to treat the archive as ZIP64 (taking precedence over formally setting versionNeededToExtract).
@Ckitakishi:
- Was this relaxation needed for archives created by other libraries/apps? Is this also the reason for the additional size check?
- Do you recall how the test asset was created?
@JetForMe:
One thing I noticed with the archive you attached to the referenced issue:
When printing ZIP contents via zipinfo -v GO-M8010-6-P1.3mf, I get the following output for the last 2 entries:
There are an extra -8 bytes preceding this file. Very likely that we can't deal with that (and not sure if we should 🙈).
@weichsel
Was this relaxation needed for archives created by other libraries/apps? Is this also the reason for the additional size check?
To be honest, I don't remember the reason clearly :cry: Regarding the extra size checking, if I remember correctly, there seems to be a issue with the current compression or decompression implementation, which can cause crash.
Do you recall how the test asset was created?
I think I created it via ZIPFoundation(by modifying the max size limit), because it's hard to include a large zip64 file in repository
Since I'm on vacation right now, I'll take a look at it again after that.
Hello, recently I had a chance to investigate this PR again~
Original issue
First, let me explain why the file attached in https://github.com/weichsel/ZIPFoundation/issues/318 cannot be unarchived properly,
- The ZIP64 Extended Information Extra Field can exist in Local File Header (LFH) or Central Directory (CD). According to the spec 4.5.3, this rule implies CD doesn't need to include both sizes, which makes sense, as the CD serves primarily as a lookup directory, from what I’ve learned, historically, space-saving was an important consideration back in 1989.
4.5.3 ... This entry in the Local header MUST include BOTH original and compressed file size fields.
- Based on this design, the values in the Extended Information fields are always greater than 0 in the CD, so the current implementation should work if the ZIP files are created follow this rule.
Compatibility improvement
Secondly, I think the proposed fix in this PR is reasonable. However, we should first consider improving the compatibility of ZIPFoundation’s ZIP64 implementation:
- Currently,
nilvalues are actually treated as0, which is not elegant enough. A better approach might be to useOptionaltypes to representnil, rather than mixing it with0. - However, this improvement is not easy, because currently we're using
MemoryLayout<T>and similar approaches in multiple places to calculate type sizes. Changing a type toOptionalwill definitely break them.
About testExtractCompressedZIP64Entries.zip
To be honest, I don’t remember exactly how I created this file… I believe I generated it using ZIPFoundation by manually modifying the max size limit. But I think it conforms to the ZIP64 specification. Here are a few points:
| Field | Value | Notes | Comment |
|---|---|---|---|
| Signature | 50 4B 05 06 | Correct | |
| Number of this disk | 00 00 | Should be 0xFFFF for ZIP64 | should be correct according to 4.4.1.4 |
| Number of disk with start of CD | 00 00 | Should be 0xFFFF for ZIP64 | |
| Total number of entries in CD on this disk | 01 00 | Should be 0xFFFF for ZIP64 | |
| Total number of entries in CD | 01 00 | Should be 0xFFFF for ZIP64 | |
| Size of CD | 77 00 00 00 | Should be 0xFFFFFFFF for ZIP64 | |
| Offset of CD | FF FF FF FF | Correct | |
| Comment length | 00 00 | Correct |
-
End of Central Directory (EOCD), according to the spec, it's not required that all fields in EOCD be set to
-1(0xFFFFor0xFFFFFFFF) when an individual field exceeds its max value, and I couldn't find any other clues that imply we should set all values to max from the documentation.4.4.1.4 If one of the fields in the end of central directory record is too small to hold required data, the field SHOULD be set to -1 (0xFFFF or 0xFFFFFFFF) and the ZIP64 format record SHOULD be created.
-
Central Directory (CD), as mentioned earlier, the ZIP64 Extended Information field in the Central Directory does not need to include both compressed and uncompressed sizes.