creek
creek copied to clipboard
Handle XML namespaces in worksheets
We've run into an issue with parsing an XLSX when the nodes are namespaced (e.g. <x:row>).
~This PR addresses that issue by using the local_name method when looking for row, c, v and t nodes. The name method includes the namespace, e.g. x:row, but local_name will strip the namespace prefix, allowing the existing comparison logic to work.~
This PR addresses that issue by identifying the namespace prefix (if there is one) while SAX parsing the sheet and looking for nodes whose name includes the prefix.
Additionally, when the shared strings dictionary is built, this PR identifies the namespace prefix (if there is one) and includes the namespace in the CSS query used to parse the dictionary. An alternative approach would be to call remove_namespaces! on the document, but that seems a bit heavy handed.
After thinking about it more, I decided that it makes more sense to use the approach taken for the shared strings dictionary when parsing the sheet's rows as well. Using local_name is akin to calling remove_namespaces! which runs the risk of parsing nodes that we shouldn't (nodes named row, c, v or t but in a different namespace).
Making the row parsing logic namespace aware seems like the better solution.