RAthena
RAthena copied to clipboard
having datetime-strings in input parameters will produce errors
Issue Description
A query such as:
tbl(con, "table) %>%
filter(date_col == "2020-01-31 51:59:25")
will produce an error, as Athena does not understand a literal of the form date '2020-01-31 51:59:25', but R can cast it to a date even with 'as.Date(x, tryFormats = "%Y-%m-%d")' used for testing in sql_escape_string.AthenaConnection.
This can be fixed with;
# Athena specifc S3 method for converting date variables and iso formateed date strings to date literals
#' @rdname sql_translate_env
#' @export
sql_escape_string.AthenaConnection <- function(con, x) {
all_dates <- all(try(as.Date(x, tryFormats = "%Y-%m-%d"), silent=T) == x) & all(nchar(x) == 10)
if(all_dates & !is.na(all_dates)) {
paste0('DATE ', DBI::dbQuoteString(con, x))
} else {
DBI::dbQuoteString(con, x)
}
}
However, it wouldn't be a bad idea to have similar handling of date times and format those to, e.g., "datetime'2020-01-31 51:59:25'".
I will look into it, but it would be good to at least have the fix above included before pushing to cran.
Are you sure datetime'2020-01-31 51:59:25'
works from what I have read it needs to be in TIMESTAMP '2001-08-22 03:04:05.321'
data types
Ah, sorry, you are right: it should be 'timestamp' and not 'datetime'. The fractional seconds are optional, however, but can have at maximum three digits.
Timestamps of the format '2020-01-31T11:59:25Z' do not work, however.
This is true, however I think this bug fix should be sufficient enough for what it is try to achieve.
Yes!
If one wants similar handling of timestamps, there is an additional hurdle:
dbplyr formats timestamp variables in the above mentioned format (e.g., '2020-01-31T11:59:25Z', https://github.com/tidyverse/dbplyr/blob/38cdafd74303baaee02e545831d242de6c656f4c/R/escape.R#L76) so one would need to either be able to write an athena specific version of escape.POSIXt (that is, dbplyr should export the generic or a sql_escape_POSIXt etc) or alternatively one could just drop the T's and Z's and what not before passing the string on, but then one will surely get in trouble with timezones.
So the dates seem to be the low hanging fruit and timestamps spell trouble (as usual)...

PR #72 now includes this bug fix. Will leave issue open
FYI: the development version of dbplyr has now generics for sql_escape_date and sql_escape_datetime, which allow, for example, Athena-specific handling of date-variables as inputs. Also simple AthenaConnection-methods exist there now.
https://github.com/tidyverse/dbplyr/pull/391
If I am not missing something, this should not affect the functionality of RAthena of noctua. For the case of real date variables (as opposed to date strings such as "2020-01-01"), dbplyr shoud add the 'date' keyword out of the box and the "is this string actually a date" -deduction will not be needed nor done for those cases.
For the date strings, the implemented date test is still required. That is something Hadley didn't want to implement in dbplyr as the default behaviour. However, in my opinion, that feature is super useful.
Do you see some issues with these changes?
Ah perfect. it looks like you have done alot of work with dbplyr to make it simpler. If I have understood this correctly the sql_escape_string
can be removed or atleast only implemented for early versions of dbplyr? If this is the case I will wrap a if statement around if for the dbplyr version :)
@OssiLehtinen just had a deeper look at the new changes in dbplyr and it looks like no change is require for RAthena
or noctua
as you said earlier (sorry just waking up and that :D ). All I will do is align the syntax and use lower case instead of upper :D