mso-dumper
mso-dumper copied to clipboard
xls-dump.py consumes lot of memory on some file
Hello,
I'm using xls-dump.py through the indexer "recoll".
It turns out that the index was generating out-of-memory and finally freezing the machine because it was chocking on a specific file named fat-loop.xls
. This file is found in Mediawiki website source (at least version 1.33.4, 1.34.4, 1.35.0 and 1.35.1).
To reproduce (adapt path as necessary):
python3 xls-dump.py --dump-mode=canonical-xml --utf-8 --catch /home/data/www/html/mw1.35.1/tests/phpunit/data/MSCompoundFileReader/fat-loop.xls
I tried with xls-dump.py from commit db25622 and could confirm the issue is still present.