FACT_core
FACT_core copied to clipboard
why are there missing files
For some firmware images, the analysis never finishes, and there are missing files in the Admin/Find Missing Analysis tab.
Is this expected? In case no, how can I see what it went wrong to fix it? There are no messages on the Logs tab, and I am using FACT_docker.
Hi,
For some firmware images, the analysis never finishes
Do you mean that entries in "Currently analyzed firmware" (/system_health) never complete? That could happen if there are errors during analysis or unpacking and the file gets lost during scheduling (but it should obviously not happen). Since there don't seem to be any helpful log messages, it could be complicated to debug the problem. Did you maybe see any errors or stack traces in the terminal output (docker logs
could help here since you are using FACT_docker)? There could be unexpected errors that don't result in log messages.
Do you mean that entries in "Currently analyzed firmware" (/system_health) never complete?
Correct.
I rerun the test and reproduced the error, but I now see that there is a time out exception that seems to be not handled correctly, maybe because when handling an exception other exceptions were raised. Attached is the exception log message. fact_log.txt
The extractor that seems to take a long time before the exception is binwalk as seen by ps.
The error is indeed not handled correctly. Nevertheless, it is also not clear what caused the error in the first place. Was it a particularly large or in some other way unusual file? Running binwalk usually takes some time for large files (which may be the cause of the timeout). You could also try to run the extractor manually on the file as documented here to maybe see what causes the error.
The issue here I think is that binwalk does indeed take too much time for some files, and that FACT_core does not correctly handle timeouts in FACT_extractor. In some cases binwalk extracts bogus data and as FACT_extractor is called in a recursive manner, a very large file can be sent to binwalk for further extraction.
Feel free to assign this to me.
We are always happy to receive external contributions and will try to support you, so feel free to try to improve this. Some things to note:
- binwalk is only used in two cases for extraction:
- when the file format is not known (e.g. the file is a binary blob without headers)
- when the extraction for the file's format with the designated plugin fails (as a fall-back option)
- the output of binwalk is already (partly) filtered: we try to sort out bogus archives by verifying if the output is really an archive of the type
- the extractor runs as a docker container and is called from unpack_base.py
I submited a merge request to fix any timeout in fact_extractor. I tested this on v3.3. Unfortunately I could not test this on main, but I think it should still work.
I also submited a patch to fact_extractor to try to get partial results in case binwalk does not finish.
I think it's worth to mention the PR: https://github.com/fkie-cad/FACT_core/pull/852
And the other PR is https://github.com/fkie-cad/fact_extractor/pull/94
Hello!
First of all: thank you guys so much for the contributions here.
Unfortunately, our lead developer @jstucke and his right hand @maringuu are pretty busy this week, which is why probably nothing will happen until the 19th.
Just giving you a heads up - normally both PRs would've already been considered :-)