fact_extractor
fact_extractor copied to clipboard
Extracting zip with generic carver produces wired results
Running fact_extractor with 0.zip gives me wired results. Here is the output of tree in the respective extraction directory:
.
├── files
│ └── 0.zip
├── input
│ └── 0.zip
└── reports
└── meta.json
The report tells us that one file was extracted which is the file itself. They even have the same hashes.
What happened here?
Looking at the code I just noticed that this is a binwalk issue.
Actually I think this is our issue. Adding "--rm" to the binwalk invocation might be the solution (and works for my limited test cases).
It seems to me this 0.zip
being unpacked by the generic_carver (or rather not being detected as MIME type ZIP) is a bug in itself. As far as I can tell, the header starts with the usual magic string PK\x03\x04
but for whatever reason file
detects it as application/octet-stream
Actually I think this is our issue. Adding "--rm" to the binwalk invocation might be the solution (and works for my limited test cases).
I don't think this is (entirely) our fault. Unpacking the same file from itself is the fault of binwalk IMHO. Adding --rm
works for this file but I tried it with a different file (which previously was unpacked successful with binwalk) and this causes the file to not be unpacked at all. The problem is probably that binwalk also does not recognize the file as zip and simply tries to carve files from the file and it finds a zip file at offset 0 (the file itself).
We could also try to handle this specific case in "fact_helper_file" and force the file to be detected as application/zip
(the default application/zip
unpacker has no problem unpacking the file). The file actually seems to be a OOXML file but that type does not come with a MIME definition in the standard file magic.
But is this a general problem with binwalk or is this a special case? Does this only affect zip files that are not detected as zip or also other files?