RAthena icon indicating copy to clipboard operation
RAthena copied to clipboard

having datetime-strings in input parameters will produce errors

Open OssiLehtinen opened this issue 5 years ago • 8 comments

Issue Description

A query such as:

tbl(con, "table) %>% 
filter(date_col == "2020-01-31 51:59:25")

will produce an error, as Athena does not understand a literal of the form date '2020-01-31 51:59:25', but R can cast it to a date even with 'as.Date(x, tryFormats = "%Y-%m-%d")' used for testing in sql_escape_string.AthenaConnection.

This can be fixed with;

# Athena specifc S3 method for converting date variables and iso formateed date strings to date literals
#' @rdname sql_translate_env
#' @export
sql_escape_string.AthenaConnection <- function(con, x) {
  all_dates <- all(try(as.Date(x, tryFormats = "%Y-%m-%d"), silent=T) == x) & all(nchar(x) == 10)
  if(all_dates & !is.na(all_dates)) {
    paste0('DATE ', DBI::dbQuoteString(con, x))
  } else {
    DBI::dbQuoteString(con, x)
  }
}

However, it wouldn't be a bad idea to have similar handling of date times and format those to, e.g., "datetime'2020-01-31 51:59:25'".

I will look into it, but it would be good to at least have the fix above included before pushing to cran.

OssiLehtinen avatar Jan 31 '20 13:01 OssiLehtinen

Are you sure datetime'2020-01-31 51:59:25' works from what I have read it needs to be in TIMESTAMP '2001-08-22 03:04:05.321' data types

DyfanJones avatar Jan 31 '20 14:01 DyfanJones

Ah, sorry, you are right: it should be 'timestamp' and not 'datetime'. The fractional seconds are optional, however, but can have at maximum three digits.

Timestamps of the format '2020-01-31T11:59:25Z' do not work, however.

OssiLehtinen avatar Jan 31 '20 14:01 OssiLehtinen

This is true, however I think this bug fix should be sufficient enough for what it is try to achieve.

DyfanJones avatar Jan 31 '20 14:01 DyfanJones

Yes!

If one wants similar handling of timestamps, there is an additional hurdle:

dbplyr formats timestamp variables in the above mentioned format (e.g., '2020-01-31T11:59:25Z', https://github.com/tidyverse/dbplyr/blob/38cdafd74303baaee02e545831d242de6c656f4c/R/escape.R#L76) so one would need to either be able to write an athena specific version of escape.POSIXt (that is, dbplyr should export the generic or a sql_escape_POSIXt etc) or alternatively one could just drop the T's and Z's and what not before passing the string on, but then one will surely get in trouble with timezones.

So the dates seem to be the low hanging fruit and timestamps spell trouble (as usual)...

days-since-last-timezone-issue-1-days-since-last-timezone-66030538

OssiLehtinen avatar Jan 31 '20 14:01 OssiLehtinen

PR #72 now includes this bug fix. Will leave issue open

DyfanJones avatar Feb 03 '20 16:02 DyfanJones

FYI: the development version of dbplyr has now generics for sql_escape_date and sql_escape_datetime, which allow, for example, Athena-specific handling of date-variables as inputs. Also simple AthenaConnection-methods exist there now.

https://github.com/tidyverse/dbplyr/pull/391

If I am not missing something, this should not affect the functionality of RAthena of noctua. For the case of real date variables (as opposed to date strings such as "2020-01-01"), dbplyr shoud add the 'date' keyword out of the box and the "is this string actually a date" -deduction will not be needed nor done for those cases.

For the date strings, the implemented date test is still required. That is something Hadley didn't want to implement in dbplyr as the default behaviour. However, in my opinion, that feature is super useful.

Do you see some issues with these changes?

OssiLehtinen avatar Apr 21 '20 07:04 OssiLehtinen

Ah perfect. it looks like you have done alot of work with dbplyr to make it simpler. If I have understood this correctly the sql_escape_string can be removed or atleast only implemented for early versions of dbplyr? If this is the case I will wrap a if statement around if for the dbplyr version :)

DyfanJones avatar Apr 21 '20 07:04 DyfanJones

@OssiLehtinen just had a deeper look at the new changes in dbplyr and it looks like no change is require for RAthena or noctua as you said earlier (sorry just waking up and that :D ). All I will do is align the syntax and use lower case instead of upper :D

DyfanJones avatar Apr 21 '20 08:04 DyfanJones