readxl icon indicating copy to clipboard operation
readxl copied to clipboard

Retain hyperlink URLs from text column

Open k5cents opened this issue 4 years ago • 2 comments

Is there a way readxl could retain the hyperlinks from the Excel XML data? Not sure how you'd present that in a data frame, but maybe it could be extracted with a different function?

image

library(readxl)
library(fs)
url <- "https://www.michigan.gov/documents/web_contract_12037_7.xls"
download.file(url, tmp <- file_temp(ext = path_ext(url)))
sht <- excel_sheets(tmp)[2]
dat <- read_excel(tmp, sht)
#> New names:
#> * Name -> Name...3
#> * Name -> Name...5
head(dat$`Links To Contracts`)
#> [1] "View Contract" "View Contract" "View Contract" "View Contract"
#> [5] "View Contract" "View Contract"

Created on 2020-06-18 by the reprex package (v0.3.0)

k5cents avatar Jun 18 '20 15:06 k5cents

I'm not thinking about readxl at the moment, but I can imagine what's going on here because there's a similar phenomenon in Google Sheets (https://github.com/tidyverse/googlesheets4/issues/172).

The long-term solution would be to create an official column type around hyperlink cells. And then, if readxl had a richer col spec interface, there is the additional possibility of expressing whether you want link text, URL, or both.

But I'm afraid in the meantime, your best bet is to use something like tidyxl (https://cran.r-project.org/web/packages/tidyxl/index.html) to get all the gory details of the cell, where I expect you can find the URLs.

jennybc avatar Jun 18 '20 18:06 jennybc

Very sorry but I never got around to hyperlinks in tidyxl. There's an old issue that was waiting for someone to need them. https://github.com/nacnudus/tidyxl/issues/42.

nacnudus avatar Jun 19 '20 11:06 nacnudus