mso-dumper icon indicating copy to clipboard operation
mso-dumper copied to clipboard

xls-dump.py consumes lot of memory on some file

Open xeyownt opened this issue 3 years ago • 1 comments

Hello,

I'm using xls-dump.py through the indexer "recoll". It turns out that the index was generating out-of-memory and finally freezing the machine because it was chocking on a specific file named fat-loop.xls. This file is found in Mediawiki website source (at least version 1.33.4, 1.34.4, 1.35.0 and 1.35.1).

To reproduce (adapt path as necessary):

python3 xls-dump.py --dump-mode=canonical-xml --utf-8 --catch /home/data/www/html/mw1.35.1/tests/phpunit/data/MSCompoundFileReader/fat-loop.xls

I tried with xls-dump.py from commit db25622 and could confirm the issue is still present.

xeyownt avatar Sep 15 '21 11:09 xeyownt

fat-loop.xls

xeyownt avatar Sep 15 '21 11:09 xeyownt