tidyquery icon indicating copy to clipboard operation
tidyquery copied to clipboard

Work around problems involving typed NAs

Open ianmcook opened this issue 4 years ago • 1 comments

In SQL, NULLs are not typed. But in R, NAs are typed. By default, NA is logical, and there are other variants NA_real_, NA_character_, and NA_integer_. This can cause problems when SQL queries contain CASE expressions or calls to the coalesce() function, because the dplyr functions that these are translated to (case_when() and coalesce() respectively) require that all the possible returned values inside these function calls have the same data type. queryparser has no way of knowing what data types these returned values will have at parse time, so it cannot translate to the appropriate NA type, so it just uses logical NA. So in tidyquery, if the other returned values are numeric, character, or integer, but one or more is NA, then errors like this result:

#Error: must be a double vector, not a logical vector

Consider if there is any way to resolve this by examining the expressions in tidyquery and replacing logical NAs with NA_real_, NA_character_, ore NA_integer_ depending on the types of the other values.

In the meantime, the workaround for users is to cast NULLs to the expected data types in the THEN and ELSE values of CASE expressions and in the arguments to coalesce() in the SQL. For example:

CASE WHEN name = 'Bobby' THEN 'Bob' ELSE CAST(NULL AS STRING) END

coalesce(fname, lname, CAST(NULL AS STRING))

ianmcook avatar Sep 11 '19 01:09 ianmcook

In newer versions of dplyr, case_when() and coalesce() allow the user to pass NA in the arguments and automatically convert it to the correct type, making this issue moot.

For details see: https://twitter.com/dvaughan32/status/1542942862077317121 https://twitter.com/ianmcook/status/1543000043384340480

So there is no longer any need to try to work around this in tidyquery. However this bullet in the Current Limitations section of the queryparser readme should be updated:

  • When tidyverse is set to TRUE, SQL expressions that use CASE or coalesce() with NULLs in the arguments can return expressions that throw data type errors when evaluated. This is because NULL translates to NA, which is by default a logical constant (not a numeric, integer, or character constant). To work around this, cast NULL to the expected data type in the SQL expression.

ianmcook avatar Jul 02 '22 22:07 ianmcook

queryparser readme updated in https://github.com/ianmcook/queryparser/commit/ab74cc99d9df657607d28cc31104cda0618e130e

ianmcook avatar Jan 09 '23 21:01 ianmcook