oletools
oletools copied to clipboard
High RAM usage while find_external_relationships execution
Affected tool: ooxml, oletools
Describe the bug While running this piece of code against a xlsm file (4.4MB of size)
xml_parser = ooxml.XmlParser(filepath)
for relationship, target in oleobj.find_external_relationships(xml_parser):
<do stuff>
I noticed that the execution was stuck on the find_external_relationships
call, while my RAM usage was increasing continously. I had to kill the python process after a 15 GB of RAM increase because it was starting to swap.
After a bit of inspection I noticed that something was happening during the parsing of the subfile xl/pivotCache/pivotCacheRecords33.xml
in the iter_xml
call of the XmlParser
, which effectively is really heavy when unzipped.
$ ll sample.xlsm
-rw-rw-r-- 1 user user 4,4M nov 24 16:51 sample.xlsm
$ du -sh unzippedsample
300M unzippedsample
$ $ ll unzippedsample/xl/pivotCache/pivotCacheRecords33.xml
-rw-rw-r-- 1 user user 167M gen 1 1980 unzippedsample/xl/pivotCache/pivotCacheRecords33.xml
I guess there's some kind of in-memory storage of the elemets coming from this parsing somewhere that is causing this high RAM usage, but it's just a guess, unfortunately I couldn't spend more time in debugging the issue. I'll update the thread if I'll discover something more.
File/Malware sample to reproduce the bug / How To Reproduce the bug The sample that was causing the issue comes from a customer, so I can't share it with you. But I think it could be reproduced building some kind of heavy xls with large data in some subfile..
Version information:
- OS: Ubuntu 18.04.5 LTS (Bionic Beaver)
- Python version: 3.6.9
- oletools version: 0.56