fix: extractor.py extract_file_zip hang when processing .dll file
Description
When running the latest cve-bin-tool against a package containing .dll files, cve-bin-tool hang and showed an ERROR message like below:
"Failed to extract xxxx using unzip. Trying 7z."
The latest release of cve-bin-tool does NOT have the problem
To reproduce
Steps to reproduce the behaviour:
Run cve-bin-tool against an .dll file (might be credential protected) Expected behaviour: Actual behaviour: cve-bin-tool hang
Version/platform info
Version of CVE-bin-tool( e.g. output of cve-bin-tool --version): the latest main branch
Anything else?
I did a manual test by replacing the extract_file_zip method using the one in the latest release as below. The problem is resolved. I am not sure if what I did is correct and thus post it here and seek correct fixes.
Code in master latest branch that caused hang when processing .dll file
async def extract_file_zip(filename, extraction_path, process_can_fail=True):
"""Extracts ZIP files using an invalid key to prevent
freezing during extraction if they are password protected.
Providing a key during extraction has no effect if the zip file is
not password protected and extraction will happen as normal."""
if await aio_inpath("unzip"):
result = await unzip_file(filename, extraction_path, process_can_fail)
if result == 0:
return result
LOGGER.debug(f"Failed to extract {filename} using unzip. Trying 7z.")
if await aio_inpath("7z"):
return await unzip_7z(filename, extraction_path, process_can_fail)
else:
with ErrorHandler(mode=ErrorMode.Ignore) as e:
await aio_unpack_archive(filename, extraction_path)
return e.exit_code
Code from the latest official release -- worked without any issues.
async def extract_file_zip(filename, extraction_path, process_can_fail=True):
"""Extracts ZIP files using an invalid key to prevent
freezing during extraction if they are password protected.
Providing a key during extraction has no effect if the zip file is
not password protected and extraction will happen as normal."""
is_exe = filename.endswith(".exe")
key = "StaticInvalidKey"
if await aio_inpath("unzip"):
stdout, stderr, _ = await aio_run_command(
["unzip", "-P", key, "-n", "-d", extraction_path, filename],
process_can_fail,
)
if stderr:
if "incorrect password" in stderr.decode():
LOGGER.error(
f"Failed to extract {filename}: The file is password protected"
)
return 0
if is_exe:
return 0 # not all .exe files are zipfiles, no need for error
return 1
elif await aio_inpath("7z"):
stdout, stderr, _ = await aio_run_command(
["7z", "x", f"-p{key}", filename], process_can_fail
)
if stderr or not stdout:
if "Wrong password" in stderr.decode():
LOGGER.error(
f"Failed to extract {filename}: The file is password protected"
)
return 0
if is_exe:
return 0 # not all .exe files are zipfiles, no need for error
return 1
else:
with ErrorHandler(mode=ErrorMode.Ignore) as e:
await aio_unpack_archive(filename, extraction_path)
return e.exit_code
return 0
@captainreality, you are the author of commit 2039f7ae792480d228778ff8d04e922a8498ba78, could you help fixing this issue (without breaking your use case)?
Hi @captainreality, do you have any thoughts about this issue ? Thanks.
I've taken a look and I can't see a cause. My fix was to handle extraction of MSI files in the case where both unzip and 7zip were installed. unzip always fails on MSI files, while 7zip succeeds. So what my fix does is try using unzip first, and if that fails, it tries to use 7zip.
Based on the logging, this particular case is clearly going through the flow where it tries unzip, that doesn't work, so then it's trying 7zip. We had a specific case where this was happening when scanning a Windows MSI file.
There's a few possibilities I can think of:
- 7zip is somehow freezing. That seems unlikely.
- 7zip isn't freezing and I missed a logic flow somewhere, although I can't see where. Perhaps it's something further along after the extract_file_zip function.
- I notice the 7zip flow doesn't use extraction_path. That's potentially something to look into, although the 7z extraction is working for our particular use case. Maybe something is going into the wrong spot
- The failed unzip run is somehow polluting the extraction path, or the 7z and unzip outputs are somehow getting smooshed together.
Unfortunately, without a specific reproduction example this is a bit tricky. Another complicating factor is the behavior can differ based on what is installed (i.e. unzip and 7zip, unzip only, or 7zip only). It would be good to know what is installed in the repro case.
If we had a case where this can be reproduced, I would add logging for the happy path and unhappy path of extract_file_zip, unzip_file, and unzip_7z. And also logging to print the file contents and directory path of the extracted data.
Another (somewhat orthogonal) option is refactoring it so that it only tries to use 7z on MSI files. That wasn't especially straight forward so I didn't do it.
So a repro example would be good, and combined with more logging, it should be pretty fixable. I'm happy to do it if we have a repro case.