readxl
readxl copied to clipboard
readxl fails with truncated dates
Related to #462, #716, #737.
In my data set the dates are sometimes truncated:
df <- data.frame(Sample = c("S1","S2","S3"), Date = c("2020-01-01","2023-05","2015"))
When I want to import this data frame into R using readxl
, I seem to have two options: either import the Date
column as text
, in this case "2020-01-01" will become 43831, or import it as date
, in this case "2023-05" will become NA and "2015" will become "1905-07-07" without any warnings.
One way to solve this is to import the column as text
and then use a custom function to convert back to the original, using a function like this (though it is strange that origin required for accurate conversion is not 1970-01-01):
recover_dates <- function(x) {
foo <- function(A) {
A_num <- suppressWarnings(as.numeric(A))
if (!is.na(A_num) && A_num > 10000) {
return(as.character(as.Date(A_num, origin = "1899-12-30")))
} else {
return(A)
}
}
unname(sapply(x, foo))
}
recover_dates(df$Date)
However, it would be even better if col_types = "text"
would just read in the Date column as is, without any conversions.
(You might wonder how I can use these truncated dates in any analysis: I usually simulate a valid date within the boundaries of the information I have, but for this, I need to read in the values as is.)
EDIT: if I manually export a sheet of an xlsx as csv I can then import it using read.csv()
and it works as intended.