readxl
readxl copied to clipboard
Report on named ranges
Now that readxl can read from a range, it would be nice to extract the named ranges, for possible use in a subsequent call to read_excel(..., range = ...)
. There could be a metadata function, like excel_sheets()
, but excel_ranges()
.
Related to https://github.com/nacnudus/tidyxl/issues/17 and #79 (in which I say we won't do this, but maybe I was wrong).
cc @sz-cgt @nacnudus
May I suggest that excel_ranges()
return a tibble with name, scope(named ranges can be local or workbook-scoped) and reference(in $A$1 format). There are a few issues I noticed while digging around:
- names can be duplicated so long that the scope is different. Therefore it is important that we allow the user to make his mind which name he is referring to (one of local names or a global name)
- sheet references of
definedName
(localSheetId
attribute) are zero-based indexes of sheets as they appear in the workbook (ther:id
attribute of the sheet for.xlsx
), not as they were created (which is whatsheetId
attribute is for.xlsx
). - definedNames can refer to another defined name, therefore we may want to help the user by "resolving" such nested names to the ultimate reference (maybe). In these cases eventual local scoping of the name is explicitly visible in the sheet name before the defined name (e.g.
Sheet1!Sales
vs.Sales
). - As observed by @eibanez, definedNames can refer to non-contiguous range lists (separated by comma or semicolon, depending on locale). I think we should just return those lists unparsed, otherwise things get really messy really quick.
- All of the above is fairly transparent in
.xlsx
, but not nearly as obvious in.xls
. Again, @eibanez did an awesome job at parsing the.xls
here. Some of the code might actually be more suited upstream inlibxls
.
Is / was there an PR from @eibanez that happened before my time or that I failed to really register, when I was working on fixing other stuff? BTW thanks for all the work re: thinking through this issue.
I don't think I made an official PR. The changes are over 2 years old and, at the time, they weren't accepted in the upstream libxls
repository.
Unfortunately, I cannot work on further changes in the foreseeable future, but you are more than welcome to borrow the stuff from my branch. #79 has all the info.
Ah, ok now I see my own note there. So it looks like we have finally gotten to the "might revisit later" point 🙂.
@jennybc do you need someone to create a PR from the work @eibanez did? Or are you going to take it on?
I am on the fence. On one hand, it's possible a PR would be extremely helpful and I could merge when I revisit readxl this fall. It's also possible it would be easier for me to do once I upload all this into my head again, esp. if it integrates with other things I'm doing. I have a big internal integration of xls and xlsx coming and this would most naturally happen after that, because that will be a pretty major internal change. This named range work is much easier for xlsx than xls, but needs to happen for both.
As I need this feature, I have gone ahead and written an implementation for it myself (see open PR). Please consider my PR.