oletools icon indicating copy to clipboard operation
oletools copied to clipboard

String parsing slow with ~100,000+ strings

Open Blacksyke opened this issue 2 years ago • 1 comments

Affected tool: olevba

Describe the bug When an input file has on the order of ~100,000s of strings, analyze_macros() becomes very slow. Profiling reveals most of the time is spent join()ing strings, rather than the actual regex matching.

File/Malware sample to reproduce the bug 4a87ee5ecd46a3fab735656b77d0e4fea8d3d72f3a6e0fb791999a2dfe8d59d2 Attached zip file, password is "infected"

How To Reproduce the bug python3 ./olevba.py malware.bin, observe that it takes a long time, even when redirecting stdout (of which there is a lot for this file).

Expected behavior Results table is emitted in a timely fashion, regardless of the number of strings in the input.

Console output / Screenshots Lots of DEBUG Printable string found in form: in debug output. So many strings! time output with this file: python3 ./oletools/olevba.py > /dev/null 929.50s user 1896.56s system 58% cpu 1:20:32.49 total

Version information:

Additional context This is probably slowing down any automated bulk use of olevba :(

Blacksyke avatar Mar 01 '22 21:03 Blacksyke

Attached change improves performance, hope it's helpful!

Blacksyke avatar Mar 01 '22 21:03 Blacksyke