xlsx
xlsx copied to clipboard
The parser does not work in this edge case. Sheet with namespace, without correct dimension and position not declared in attributes.
I was trying to parse this sheet, but the package currently doesn't support this edge case, because in the sheet1.xml file there are namespaces in the tags, as well as the dimension tag doesn't bring the actual size of the sheet. To top it off, the row tag only has attributes informing the position in the first line, which is in the header, as well as the c tags. However, Excel can correctly parse and display the file, so I'm reporting here.
I've already had a look at the package's code and thus found the bug fixed in #22. Also, I saw that it already has a parser for namespace that can be used. So I'm willing to submit PR to fix this if I think it's worth it for the package to fix this use case.
I didn't find specifics about the missing attributes in the row and c tags, as well as the incorrect information in the dimension tag, but about the namespace I found this documentation.
Doing some research, it looks like the dimension tag is optional.
Image taken from "Part 1 “Fundamentals And Markup Language Reference”, 5th edition, December 2016"
Therefore, some implementation that makes the parser work without it can be done in two ways, in my opinion.
-
at each analyzed row tag, check if the number of analyzed lines is greater than
SheetInfo.rows
, if so,setLen()
to increase it by one line. Disadvantages of this approach: a) multiple allocations and reallocations; and b) if a line with more columns thanSheetInfo.cols
appears, all the alignment ofseq
inSheet.data
is lost, which, to be avoided, needs to make manual and adjusted copies of the work already done; -
perform a superficial analysis of the sheet to count the number of rows and columns only, and then create
Sheet.data
with the final size. After that, do the complete analysis of the sheet with the data collection. Advantage: a single allocation forseq
. Disadvantage: Need two analysis steps.
Approach 2 seems to be the best, in my opinion.
Approach 2 can also solve problem #21 as, in that file, the sheet1.xml does not have the dimension tag