tidyquery
tidyquery copied to clipboard
Work around problems involving typed NAs
In SQL, NULL
s are not typed. But in R, NA
s are typed. By default, NA
is logical, and there are other variants NA_real_
, NA_character_
, and NA_integer_
. This can cause problems when SQL queries contain CASE
expressions or calls to the coalesce()
function, because the dplyr functions that these are translated to (case_when()
and coalesce()
respectively) require that all the possible returned values inside these function calls have the same data type. queryparser has no way of knowing what data types these returned values will have at parse time, so it cannot translate to the appropriate NA
type, so it just uses logical NA
. So in tidyquery, if the other returned values are numeric, character, or integer, but one or more is NA
, then errors like this result:
#Error: must be a double vector, not a logical vector
Consider if there is any way to resolve this by examining the expressions in tidyquery and replacing logical NA
s with NA_real_
, NA_character_
, ore NA_integer_
depending on the types of the other values.
In the meantime, the workaround for users is to cast NULL
s to the expected data types in the THEN
and ELSE
values of CASE
expressions and in the arguments to coalesce()
in the SQL. For example:
CASE WHEN name = 'Bobby' THEN 'Bob' ELSE CAST(NULL AS STRING) END
coalesce(fname, lname, CAST(NULL AS STRING))
In newer versions of dplyr, case_when()
and coalesce()
allow the user to pass NA
in the arguments and automatically convert it to the correct type, making this issue moot.
For details see: https://twitter.com/dvaughan32/status/1542942862077317121 https://twitter.com/ianmcook/status/1543000043384340480
So there is no longer any need to try to work around this in tidyquery. However this bullet in the Current Limitations section of the queryparser readme should be updated:
- When
tidyverse
is set toTRUE
, SQL expressions that useCASE
orcoalesce()
withNULL
s in the arguments can return expressions that throw data type errors when evaluated. This is becauseNULL
translates toNA
, which is by default a logical constant (not a numeric, integer, or character constant). To work around this, castNULL
to the expected data type in the SQL expression.
queryparser readme updated in https://github.com/ianmcook/queryparser/commit/ab74cc99d9df657607d28cc31104cda0618e130e