camlzip
camlzip copied to clipboard
Fix bug of infinite loop on empty but marked as 'Inflated' entries.
Description
Hi, working on camlzip I identified a bug that makes the read_entry
function goes into an infinite loop for entries that are marked as 'Deflated' but are empty.
In that very specific case, the recursive function uncompr_finish
L116 goes infinite as inflate
always returns finished = false ; _
leading to an infinite loop.
This case seems rare and unlikely, yet it seems that on windows systems it's not impossible and can sometimes happen, reason why I believe camlzip should consider this case as a valid possibility.
Reproduction step
I created a little file that reproduces the bug minimally.
It contains a single entry of an empty file.
🦆 ~/workspace $ unzip -l demo.zip
Archive: demo.zip
Length Date Time Name
--------- ---------- ----- ----
0 07-29-2024 10:58 empty.txt
--------- -------
0 1 file
I then modified the bits for the compression algorithm in both local and central headers for the file to be considered a Inflated
.
The result is the following :
🦆 ~/workspace $ zipdetails demo.zip
0000 LOCAL HEADER #1 04034B50
0004 Extract Zip Spec 0A '1.0'
0005 Extract OS 00 'MS-DOS'
0006 General Purpose Flag 0000
[Bits 1-2] 0 'Normal Compression'
0008 Compression Method 0008 'Deflated'
000A Last Mod Time 58FD5760 'Mon Jul 29 10:59:00 2024'
000E CRC 00000000
0012 Compressed Length 00000000
0016 Uncompressed Length 00000000
001A Filename Length 0009
001C Extra Length 001C
001E Filename 'empty.txt'
0027 Extra ID #0001 5455 'UT: Extended Timestamp'
0029 Length 0009
002B Flags '03 mod access'
002C Mod Time 66A759D3 'Mon Jul 29 10:58:59 2024'
0030 Access Time 66A759D3 'Mon Jul 29 10:58:59 2024'
0034 Extra ID #0002 7875 'ux: Unix Extra Type 3'
0036 Length 000B
0038 Version 01
0039 UID Size 04
003A UID 000001F7
003E GID Size 04
003F GID 00000014
0043 CENTRAL HEADER #1 02014B50
0047 Created Zip Spec 1E '3.0'
0048 Created OS 03 'Unix'
0049 Extract Zip Spec 0A '1.0'
004A Extract OS 00 'MS-DOS'
004B General Purpose Flag 0000
[Bits 1-2] 0 'Normal Compression'
004D Compression Method 0008 'Deflated'
004F Last Mod Time 58FD5760 'Mon Jul 29 10:59:00 2024'
0053 CRC 00000000
0057 Compressed Length 00000000
005B Uncompressed Length 00000000
005F Filename Length 0009
0061 Extra Length 0018
0063 Comment Length 0000
0065 Disk Start 0000
0067 Int File Attributes 0000
[Bit 0] 0 'Binary Data'
0069 Ext File Attributes 81A40000
006D Local Header Offset 00000000
0071 Filename 'empty.txt'
007A Extra ID #0001 5455 'UT: Extended Timestamp'
007C Length 0005
007E Flags '03 mod access'
007F Mod Time 66A759D3 'Mon Jul 29 10:58:59 2024'
0083 Extra ID #0002 7875 'ux: Unix Extra Type 3'
0085 Length 000B
0087 Version 01
0088 UID Size 04
0089 UID 000001F7
008D GID Size 04
008E GID 00000014
0092 END CENTRAL HEADER 06054B50
0096 Number of this disk 0000
0098 Central Dir Disk no 0000
009A Entries in this disk 0001
009C Total Entries 0001
009E Size of Central Dir 0000004F
00A2 Offset to Central Dir 00000043
00A6 Comment Length 0000
Done
You can find attached this file here : demo.zip
Type of change
This PR is a bug fix.
When implementing I faced two options :
- Considering this case as an error case and raising an error.
- Considering that case as a specific behaviour, not attempting deflation and just returning at empty string.
Taking a look at what other zip libraries are doing, such as Unzip and 7zip, I believe option 2 was the correct one, ensuring service continuity and security, therefore it is the one I implemented.
Style of change
I tried to stick as much as possible to the coding style, and tried to only modify the read_entry
function to make the review easy.
If you believe a heavier refactor would be more suitable I'll be more than happy to do it 😄 .
How Has This Been Tested?
- [x] Ensured the issue happened on the attached file before my fix
- [x] Ensured the issue did not happen on the attached file after my fix.
If you believe adding a test over that would make sense I'll also do it with pleasure.