python-calamine icon indicating copy to clipboard operation
python-calamine copied to clipboard

Ability to iterate over rows?

Open maciej-jaworski opened this issue 2 years ago • 1 comments

(I'm ignorant about how the underlying rust code works so maybe this is not feasible),

Would it be possible to add support for iterating over rows in a sheet without loading all of them into memory (similar to iter_rows that openpyxl has)?

Dealing with some larger files and while the compute performance is amazing, I end up allocating loads of memory (400+ MB for 80mb file, using iter_rows from openpyxl helps bring this down to 40mb, but it takes 5-6x longer so obviously I'd prefer to use this package).

maciej-jaworski avatar Nov 28 '23 14:11 maciej-jaworski

Hi! I created PoC, but:

  1. Calamine doesn't support lazy loading (https://github.com/tafia/calamine/pull/370). And I prefer to wait to merge this PR.
  2. Due to pyo3 and calamine limitation, we should use unsafe for iteration over the Rust structure, and it's unsafe. I need some time to research it.

dimastbk avatar Nov 29 '23 08:11 dimastbk

Due to limitation of pyo3 we can't add truly iterating over rust iterator (see PyO3/pyo3#1085). So, I added iterating over rust range #43 (after calamine read whole sheet in memory), this can decrease memory allocation in some cases (see benchmark).

dimastbk avatar Jul 15 '24 10:07 dimastbk