calamine icon indicating copy to clipboard operation
calamine copied to clipboard

Usability of finding a header row and skipping rows

Open TimoFreiberg opened this issue 5 years ago • 4 comments

I have to read excel sheets with headers where the header row is not the first row. I currently try to solve this by first using range.rows().enumerate().find(...) to get a row that contains the header names I expect and then using the index of that row to get a subrange of my original range.

I introduced a bug by using the index I got from that process as the new start and therefore started at e.g. row 3 instead of 26. I fixed that by adding the found header index to the current range start. I guess this is kind of hard to use :/

TimoFreiberg avatar Mar 31 '19 15:03 TimoFreiberg

Sorry I probably do not understand your error.

Are you using Range::range method? Are you then deserializing with this subrange?

tafia avatar Apr 08 '19 05:04 tafia

Yeah. This is about usability/ergonomics.

Since my header row wasn't the first row, I had to first find the correct row and then skip n rows, where n was the index of the header row when iterating over Range.rows().enumerate().

Having to find the header row manually was a bit annoying, but I'm not sure how the library would offer this functionality in a natural way.

Then I had to skip n rows, but I didn't notice that only nonempty rows would be returned by Range.rows(), so I called my_range.range((n, 0), my_range.end().unwrap()), although instead of n I should have used my_range.start().unwrap().0 + n (i.e. skip n rows).

I might not have made the second mistake if Range had a skipRows(self, n: u32) -> Range method because I wouldn't have noticed the error in my mental model of the Rows iterator. I also wouldn't have had to write this helper ;)

fn skip_rows(range: Range<DataType>, n: u32) -> Result<Range<DataType>, Error> {
    if range.is_empty() {
        return Err(Error::new("Range was empty"));
    }
    let start = range.start().unwrap();
    let end = range.end().unwrap();
    Ok(range.range((start.0 + n, start.1), end))
}

TimoFreiberg avatar Apr 09 '19 07:04 TimoFreiberg

but I didn't notice that only nonempty rows would be returned by Range.rows().

It is not the case. Do you have an example?

I think what you're confused about is probably relative (usize)/absolute (u32) position. Range::range uses absolute index so indeed you should build that from my_range.start.

What we could do is probably improve doc first. Then we could have a new Index<(Range<usize>, Range<usize>)> impl which would then use relative positions.

tafia avatar Apr 11 '19 10:04 tafia

Sorry for the long silence

It is not the case. Do you have an example?

I was mistaken, it seems like Range.rows() skips empty rows until a nonempty row is found but then also returns empty rows.

Anyway, I wrote helper functions as in https://gist.github.com/TimoFreiberg/3cfc0163d5b1e5b3e4fab11f948b5299 and I thought maybe something like the skip_rows function could make sense as a Range method

TimoFreiberg avatar May 15 '19 16:05 TimoFreiberg