pyxlsb icon indicating copy to clipboard operation
pyxlsb copied to clipboard

Improve performance with large worksheets

Open willtrnr opened this issue 5 years ago • 5 comments

This will serve as an umbrella issue for performance improvement.

Currently there is a bit of copying which could potentially be avoided with BIFF record reading and there's also the possibility of using a C extension (or Cython).

willtrnr avatar Apr 06 '19 02:04 willtrnr

hi, not sure if u are still developing this lib, i wrote a Rust extension to parse XLSB using Calamine. Am currently working on CFFI and using milksnake to interpolate between the two. Would you be interested in integrating it into your module? Performance wise it takes about 10-20 seconds to copy ~ 3 million cells to csv as a pure binary CLI.

CLI should work on any 32 bit windows environment. Not sure what your official compatibility requirements are but usually compiled binaries work on whatever platforms they were compiled on(given you had the correct toolchain). So if you have access to a linux, mac, and windows env, theoretically you can embed 3 different binary files to do the parsing.

hpca01 avatar Aug 02 '19 19:08 hpca01

@hpca01 that sounds interesting, however I've been putting off native modules to avoid having to compile and distribute binaries (which in the case of OSX, I won't even be able to test).

Gotta say I think it's an interesting feature to have a pure python implementation since non-cPython interpreters just work (i.e. Jython and IronPython).

I think the majority of the process should like that, but we could have optional compiled modules for certain parts (think cPickle vs pickle).

willtrnr avatar Aug 02 '19 21:08 willtrnr

@wwwiiilll yeah, i figured the compiled binary wouldn't be ideal. However dylibs shouldn't require compilation across different OS, only difference being 32bit vs 64bit python versions. When I get the CFFI bits working as expected, it will behave like a module in python. I'll shoot u a msg when i figure this C-ABI stuff out with Rust, and you can decide to incorporate it if you want.

hpca01 avatar Aug 02 '19 21:08 hpca01

@wwwiiilll , I think I can put myself forward to test on OSX but I am dummy in compiling it. So if @hpca01 and @wwwiiilll would like to offer some help, I can join it.

chfw avatar Aug 02 '19 22:08 chfw

I really appreciate the sentiment guys, but I'm a little hesitant about requiring potential developers to have the rust toolchain and, hell, rust knowledge on hand to work on this (though I can't say this has happened so far)

But, I guess, if you can come up with a module that is optional and can be built with the usual setup.py I'd gladly try to incorporate it. I believe the pain point that could be addressed with a native module is record parsing, there's quite a bit of bytes copying going on there.

willtrnr avatar Aug 03 '19 02:08 willtrnr