calamine icon indicating copy to clipboard operation
calamine copied to clipboard

Feat: check for memory errors on reading xlsx files and proper formatting of csv sample code

Open SgtPepper47 opened this issue 1 year ago • 2 comments

Updates to the Excel range creation to check the machine memory against the memory required for the range. Updates to the sample Excel to csv writer to identify the memory error and to write out a properly formatted csv file with commas instead of semicolons. Resolves #433

SgtPepper47 avatar May 08 '24 21:05 SgtPepper47

Could you add a test showing how this can fail?

tafia avatar May 11 '24 07:05 tafia

tafia, Here is an example of a file with a sheet that fails to load into memory. Essentially, it will depend on your machine, but an Excel sheet has a finite range of about 1 million rows to 16k columns. I just run the example script to convert 'Sheet1' to a csv file. Large_Sheet.xlsx Thanks.

SgtPepper47 avatar May 14 '24 20:05 SgtPepper47

tests are failing.

tafia avatar May 15 '24 05:05 tafia

May be we should check cells.len() == isize::MAX in loop, when add cell to cells? And this should be backported to other formats too.

dimastbk avatar May 15 '24 06:05 dimastbk

First, I should have opened this by saying I think calamine is a great piece of work, and I'm impressed with its performance. I see in the build that I mistakenly missed a closing parentheses. I can do another pull request to fix that if you like. To dimastbk's comment. I think it would be cleaner to check in the cells loop to see when it has exceeded the limit for the machine. However, I think cell.len() would return a usize where it wouldn't be able to check against isize. Of course, all of this is just checking for an error. I haven't come up with a solution to be able to load the Excel sheet without an error.

SgtPepper47 avatar May 15 '24 15:05 SgtPepper47

@SgtPepper47, sorry, cells.len() == usize::MAX

dimastbk avatar May 16 '24 04:05 dimastbk

I think this PR makes things worse as it allocates a potentially very large Vec for nothing (this is not the right Vec to reserve capacity).

tafia avatar May 24 '24 18:05 tafia

Closing it for now, feel free to reopen and come up with an actual improvement. Thanks

tafia avatar Jun 04 '24 01:06 tafia