oletools
oletools copied to clipboard
String parsing slow with ~100,000+ strings
Affected tool: olevba
Describe the bug
When an input file has on the order of ~100,000s of strings, analyze_macros()
becomes very slow. Profiling reveals most of the time is spent join()
ing strings, rather than the actual regex matching.
File/Malware sample to reproduce the bug 4a87ee5ecd46a3fab735656b77d0e4fea8d3d72f3a6e0fb791999a2dfe8d59d2 Attached zip file, password is "infected"
How To Reproduce the bug
python3 ./olevba.py malware.bin
, observe that it takes a long time, even when redirecting stdout (of which there is a lot for this file).
Expected behavior Results table is emitted in a timely fashion, regardless of the number of strings in the input.
Console output / Screenshots
Lots of DEBUG Printable string found in form:
in debug output. So many strings!
time
output with this file: python3 ./oletools/olevba.py > /dev/null 929.50s user 1896.56s system 58% cpu 1:20:32.49 total
Version information:
- OS:Mac
- OS version: 12.2.1 arm64
- Python version: 3.9.10 64-bit
- oletools version: commit dfbcabb957644769d17dfbb367eb3a52167c0506
Additional context This is probably slowing down any automated bulk use of olevba :(
Attached change improves performance, hope it's helpful!