calamine icon indicating copy to clipboard operation
calamine copied to clipboard

Provide a lazy iterator over the rows

Open tafia opened this issue 9 years ago • 6 comments

Today the entire range is first saved in memory. Depending on the need we could provide for both xlsx and xlsm a Row or even a Cell lazy iterator ... and collect them into a Range if needed.

tafia avatar Oct 27 '16 07:10 tafia

Do you think it would make sense to also have a non-reference iterator? I mean an iterator that would allow us to take ownership of the DataType without having to clone it. Maybe something like Vec::drain.

There was also Idiomatic way to take ownership of all items in a Vec<String>? on the forum.

EDIT: I realized before going to bed that it wouldn't make sense with the xml shared string concept.

bbigras avatar Nov 02 '16 20:11 bbigras

I realized before going to bed that it wouldn't make sense with the xml shared string concept.

Well, for this we could probably return a Cow with only strings being Cow::Borrowed ... even if it probably makes little sense to borrow the other primitives (pointer size anyway).

tafia avatar Nov 07 '16 08:11 tafia

Just wanted to say that a lazy iterator would have been super useful for my use case.

I just needed to compare the headers (first row) for 13 sheets, each with (1,000,000 x 13) cells. I ended up having to load each complete range just to get the first row.

Fortunately, Rust and this library are pretty fast, it took about 2 minutes for what appears to be a 1G file.

Let me know if I missed something, perhaps there was a way for me to get the info I needed without loading everything.

hwchen avatar May 08 '20 18:05 hwchen

The implementation might depend on the actual file type (xlsx, xslb, ods etc ...) which one was yours just for info?

tafia avatar May 17 '20 10:05 tafia

Mine was xlsx.

hwchen avatar May 21 '20 00:05 hwchen