pycel icon indicating copy to clipboard operation
pycel copied to clipboard

Large excel files are slow to open

Open bauerjon opened this issue 2 years ago • 2 comments

Hello!

I just wanted to first clarify that we'd be open to paying contributors to help optimize this for us. Please let me know if this is of interest. If not, we are looking for direction/suggestions on how to fix ourselves. Thank you! 🙏

What actually happened

When we try to open a large spreadsheet (20+ MB) using pycel it can take up to 8 minutes to startup. Many of the large sheets are static lookup tables.

What was expected to happen

Similar to opening a static csv in memory and doing a look up, I would expect opening a file this large with static data to open much faster. Ideally under 10 seconds so that we can iterate faster during development.

Problem description

When a large file like this takes this long to open, it makes iterating/making changes/debugging as a developer extremely painful.

Code Sample

https://github.com/bauerjon/slow-pycel-example

Environment

pycel==1.0b30 Python 3.9.5 Mac OS

bauerjon avatar Aug 10 '22 15:08 bauerjon

I did find that at least half the slowness comes from the load_workbook calls

https://github.com/dgorissen/pycel/blob/f4fd7e5e9feb77e5affe9fd3b1881ef47861102c/src/pycel/excelwrapper.py#L243-L245

bauerjon avatar Aug 11 '22 18:08 bauerjon

We added a fork/solution that seems to be working for our use case here

bauerjon avatar Aug 18 '22 23:08 bauerjon