xlsx icon indicating copy to clipboard operation
xlsx copied to clipboard

The parser does not work in this edge case. Sheet with namespace, without correct dimension and position not declared in attributes.

Open rockcavera opened this issue 10 months ago • 2 comments

I was trying to parse this sheet, but the package currently doesn't support this edge case, because in the sheet1.xml file there are namespaces in the tags, as well as the dimension tag doesn't bring the actual size of the sheet. To top it off, the row tag only has attributes informing the position in the first line, which is in the header, as well as the c tags. However, Excel can correctly parse and display the file, so I'm reporting here.

I've already had a look at the package's code and thus found the bug fixed in #22. Also, I saw that it already has a parser for namespace that can be used. So I'm willing to submit PR to fix this if I think it's worth it for the package to fix this use case.

I didn't find specifics about the missing attributes in the row and c tags, as well as the incorrect information in the dimension tag, but about the namespace I found this documentation.

rockcavera avatar Aug 17 '23 15:08 rockcavera

Doing some research, it looks like the dimension tag is optional. pagina 1622 Image taken from "Part 1 “Fundamentals And Markup Language Reference”, 5th edition, December 2016"

Therefore, some implementation that makes the parser work without it can be done in two ways, in my opinion.

  1. at each analyzed row tag, check if the number of analyzed lines is greater than SheetInfo.rows, if so, setLen() to increase it by one line. Disadvantages of this approach: a) multiple allocations and reallocations; and b) if a line with more columns than SheetInfo.cols appears, all the alignment of seq in Sheet.data is lost, which, to be avoided, needs to make manual and adjusted copies of the work already done;

  2. perform a superficial analysis of the sheet to count the number of rows and columns only, and then create Sheet.data with the final size. After that, do the complete analysis of the sheet with the data collection. Advantage: a single allocation for seq. Disadvantage: Need two analysis steps.

Approach 2 seems to be the best, in my opinion.

rockcavera avatar Aug 17 '23 16:08 rockcavera

Approach 2 can also solve problem #21 as, in that file, the sheet1.xml does not have the dimension tag

rockcavera avatar Aug 17 '23 21:08 rockcavera