readxl icon indicating copy to clipboard operation
readxl copied to clipboard

Report on named ranges

Open jennybc opened this issue 7 years ago • 7 comments

Now that readxl can read from a range, it would be nice to extract the named ranges, for possible use in a subsequent call to read_excel(..., range = ...). There could be a metadata function, like excel_sheets(), but excel_ranges().

Related to https://github.com/nacnudus/tidyxl/issues/17 and #79 (in which I say we won't do this, but maybe I was wrong).

cc @sz-cgt @nacnudus

jennybc avatar Apr 04 '17 21:04 jennybc

May I suggest that excel_ranges() return a tibble with name, scope(named ranges can be local or workbook-scoped) and reference(in $A$1 format). There are a few issues I noticed while digging around:

  1. names can be duplicated so long that the scope is different. Therefore it is important that we allow the user to make his mind which name he is referring to (one of local names or a global name)
  2. sheet references of definedName (localSheetId attribute) are zero-based indexes of sheets as they appear in the workbook (the r:id attribute of the sheet for .xlsx), not as they were created (which is what sheetId attribute is for .xlsx).
  3. definedNames can refer to another defined name, therefore we may want to help the user by "resolving" such nested names to the ultimate reference (maybe). In these cases eventual local scoping of the name is explicitly visible in the sheet name before the defined name (e.g. Sheet1!Sales vs. Sales).
  4. As observed by @eibanez, definedNames can refer to non-contiguous range lists (separated by comma or semicolon, depending on locale). I think we should just return those lists unparsed, otherwise things get really messy really quick.
  5. All of the above is fairly transparent in .xlsx, but not nearly as obvious in .xls. Again, @eibanez did an awesome job at parsing the .xls here. Some of the code might actually be more suited upstream in libxls.

dmi3kno avatar Aug 18 '17 10:08 dmi3kno

Is / was there an PR from @eibanez that happened before my time or that I failed to really register, when I was working on fixing other stuff? BTW thanks for all the work re: thinking through this issue.

jennybc avatar Aug 18 '17 14:08 jennybc

I don't think I made an official PR. The changes are over 2 years old and, at the time, they weren't accepted in the upstream libxls repository.

Unfortunately, I cannot work on further changes in the foreseeable future, but you are more than welcome to borrow the stuff from my branch. #79 has all the info.

eibanez avatar Aug 24 '17 20:08 eibanez

Ah, ok now I see my own note there. So it looks like we have finally gotten to the "might revisit later" point 🙂.

jennybc avatar Aug 24 '17 20:08 jennybc

@jennybc do you need someone to create a PR from the work @eibanez did? Or are you going to take it on?

sz-cgt avatar Aug 26 '17 21:08 sz-cgt

I am on the fence. On one hand, it's possible a PR would be extremely helpful and I could merge when I revisit readxl this fall. It's also possible it would be easier for me to do once I upload all this into my head again, esp. if it integrates with other things I'm doing. I have a big internal integration of xls and xlsx coming and this would most naturally happen after that, because that will be a pretty major internal change. This named range work is much easier for xlsx than xls, but needs to happen for both.

jennybc avatar Aug 26 '17 21:08 jennybc

As I need this feature, I have gone ahead and written an implementation for it myself (see open PR). Please consider my PR.

alberthat avatar Feb 27 '24 13:02 alberthat