oletools
oletools copied to clipboard
How to deal with test data that might trigger antivirus engines
Following the question initially done in #201, I'd like to discuss it further.
After checking my sample RTF against VirusTotal, although harmless, it does trigger around 8 engines (due to heuristic checks). I've tried to change it a bit but the result was the same. I also believe that we will eventually face similar situations since we will need to simulate real malware in order to create better unit tests.
I've come up with three possible solutions (after talking to people from Intra2net):
- Encrypt or base64-encode the test data and decrypt/decode when running each test. Some utils.py file in the test folder would help here.
- Move the code to a secondary repository which contains only the unit tests and reference it as a submodule so Travis can clone it when checking PRs and commits.
- Each test creates its own test data before running, but this might get complicated and hard to maintain when dealing with complex cases.
Any other suggestion?
I think the best is to zip test files, encrypted with a known password such as "infected". Then it's easy to decrypt them in memory with python zipfile, from the test scripts. It is better than BASE64 encoding or similar, that antivirus may decode automatically. And if some AV also try the "infected" password automatically, use a different one such as "infected-test".
I think the submodule approach is best, or another method to prevent a basic pip install from triggering this issue.
As a minimum - is it possible to avoid distributing the test data with pip install oletools ? (i.e. can the issue of "how to distribute the test data without triggering scanners" be separated from the idea of "can we avoid distributing the test data entirely for those who don't need it"?)
I nearly caused an "incident" at work due to triggering virus scanners - I was merely after playing with olevba for its VBA manipulation ability...
Now that PR #217 is merged, we need to check which test files trigger antivirus detection, zip them with the correct password 'infected-test', and change the corresponding test scripts.
At least the following files are detected by Windows Defender (as Exploit:O97M/DDEDownloader!rfn):
- tests\test-data\msodde\dde-in-word2003.xml
- tests\test-data\msodde\dde-test-from-office2016.doc
- tests\test-data\ooxml\dde-in-word2003.xml
This one as Exploit:O97M/DDEDownloader.C:
- tests\test-data\msodde\dde-in-word2007.xml
Yesterday, we wanted to start to use oletools. Unfortunately wo also ran into AV incidents while building docker images out of a jenkins pipeline. Our webproxy denied the access to the pip-repo with following content scanner claim:
oletools-0.55.1\tests\test-data\msodde\dde-test.docx
--> word/document.xml <<< Contains HEUR/Downloader.DDE suspicious code
After that, I´ve downloaded the zip-package with a private system and uploaded it on virustotal.com with dramatic results:
https://www.virustotal.com/gui/file/edea57914c4040e7d0d64cfd88c84355d4305548d761d476fbac21ee26b25d8d/detection
We would very appreciate it, if you could fix this ASAP. As long as this situation persists, we won´t be able to use oletools in our environment. The approach to encrypt the tests inside of the zip-package or to outsource it in a submodule seem to be a goot compromises.
Thanks in advance!
As it happened several times already in the past, users are reporting errors and antivirus alerts when installing oletools. This is because some test files are incorrectly detected as malicious by some antivirus engines. For example the test file "dde-test-encrypt-standardpassword.xls" is now detected by Comodo AV. Potential solutions:
- remove tests from the oletools package on PyPI (reverting PR #371)
- encrypt all the test files (which requires tricky changes in several tools to support in-memory scanning, in order to leverage PR #217)
Nice, thanks for continuing working on it ;)
First of all: I am so sorry to be partially responsible for these problems. I created most of these "malicious" files and added them, not knowing that they would be distributed via pip. My understanding was that test-files do not get distributed.
My thoughts on this:
- Reverting stuff that has been in the package for a while tends to get nastier than expected
- We could adapt the build process to not release source-packages but release-packages with pip, which would then not include test data (thanks a lot @samiraguiar for finding that out)
- Why would we have to do in-memory scanning when we encrypt the files in some sort of container (like zip)? Could we not unzip to
tempfile.gettempdir()and scan the files there?
I agree that if the packages released on PyPI would not include the test files, it would be the simplest solution. Indeed it looks like a way to do it is to generate wheel distributions instead of source distributions (same for olefile: https://github.com/decalage2/olefile/issues/140). So I will change that in the next release.
About the in-memory scanning, I think it's better than using temporary files on disk, because they often trigger antivirus engines when using oletools on Windows. I would like to be able to scan a file stored in a zip with password "infected" without creating a temporary file on disk. Some time ago I started to develop a module to handle files on disk or in memory transparently, I will release it to make it easier in oletools.
Just stumbled over quite a few unittests that are still disabled and useless because of this issue. Any plans on this?
Apologies for the necro, but wanted to note that I'm seeing a number of other test files triggering antivirus (in my situation, I was seeing this in AWS GuardDuty scans flagging the pip cache for oletools). In particular, running a number of the xls* files through virustotal will show flags, eg autostart-encrypt-standardpassword.xls, encrypted.xls, and excel4_sample_macro.xls