calamine
calamine copied to clipboard
Usability of finding a header row and skipping rows
I have to read excel sheets with headers where the header row is not the first row.
I currently try to solve this by first using range.rows().enumerate().find(...)
to get a row that contains the header names I expect and then using the index of that row to get a subrange of my original range.
I introduced a bug by using the index I got from that process as the new start and therefore started at e.g. row 3 instead of 26. I fixed that by adding the found header index to the current range start. I guess this is kind of hard to use :/
Sorry I probably do not understand your error.
Are you using Range::range method? Are you then deserializing with this subrange?
Yeah. This is about usability/ergonomics.
Since my header row wasn't the first row, I had to first find the correct row and then skip n
rows, where n
was the index of the header row when iterating over Range.rows().enumerate()
.
Having to find the header row manually was a bit annoying, but I'm not sure how the library would offer this functionality in a natural way.
Then I had to skip n
rows, but I didn't notice that only nonempty rows would be returned by Range.rows()
, so I called my_range.range((n, 0), my_range.end().unwrap())
, although instead of n
I should have used my_range.start().unwrap().0 + n
(i.e. skip n rows
).
I might not have made the second mistake if Range
had a skipRows(self, n: u32) -> Range
method because I wouldn't have noticed the error in my mental model of the Rows
iterator.
I also wouldn't have had to write this helper ;)
fn skip_rows(range: Range<DataType>, n: u32) -> Result<Range<DataType>, Error> {
if range.is_empty() {
return Err(Error::new("Range was empty"));
}
let start = range.start().unwrap();
let end = range.end().unwrap();
Ok(range.range((start.0 + n, start.1), end))
}
but I didn't notice that only nonempty rows would be returned by
Range.rows()
.
It is not the case. Do you have an example?
I think what you're confused about is probably relative (usize
)/absolute (u32
) position. Range::range
uses absolute index so indeed you should build that from my_range.start
.
What we could do is probably improve doc first. Then we could have a new Index<(Range<usize>, Range<usize>)>
impl which would then use relative positions.
Sorry for the long silence
It is not the case. Do you have an example?
I was mistaken, it seems like Range.rows()
skips empty rows until a nonempty row is found but then also returns empty rows.
Anyway, I wrote helper functions as in https://gist.github.com/TimoFreiberg/3cfc0163d5b1e5b3e4fab11f948b5299 and I thought maybe something like the skip_rows
function could make sense as a Range
method