RSQLite
RSQLite copied to clipboard
CAST(NULL AS type) does not cast missing data NA to the expected type
It seems impossible to cast NA when a new variable containing only NA is created within SQL. More precisely, CAST(expr AS type) in a SELECT query always returns logical NA when the expression contains only NA/NULL values.
As a result, it does not appear possible to prevent the NA type for a new variable to be logical if the subset of data involved in the computation has only missing data.
Example 1
library(RSQLite)
library(DBI)
con <- dbConnect(SQLite(), ":memory:")
# create a data frame with numeric and character NA
df <- data.frame(num=as.numeric(NA),text=as.character(NA))
# load in the data base
dbWriteTable(con, "test", value = df, overwrite = TRUE )
# NA type corresponds to column types when recovering
rs <- dbGetQuery(con, "SELECT * FROM test") )
str(rs) 'data.frame': 1 obs. of 2 variables: $ num : num NA $ text: chr NA
So far, it behaves as expected. See following comment
- The first non-NULL value decides the type of the column.
- If all values are NULL, the result is a vector of NAs of the declared data type for the column, or logical.
# compute a new variable from NA value
rs <- str( dbGetQuery(con, "SELECT num, num+1 AS new FROM test") )
str(rs) 'data.frame': 1 obs. of 2 variables: $ num : num NA $ new: logi NA
When creating a new variable, the data type of the new variable is logical, which might be ok in this case (but this makes less sense in the second example)
rs <- dbGetQuery(con, "SELECT CAST(num+1 AS INTEGER) AS new FROM test")
str(rs) 'data.frame': 1 obs. of 1 variable: $ new: logi NA
CAST(NULL as type) does not work in this context. The NA is still logical:
# Casting a new column of NA/NULL does not work
rs<- dbGetQuery(con, "SELECT CAST(NULL AS INTEGER) AS new_num, CAST(NULL AS TEXT) new_text FROM test")
str(rs) 'data.frame': 1 obs. of 2 variables: $ new_num : logi NA $ new_text: logi NA
Example 2
The above example is contrived but the issue of the return type for NA is particularly annoying when there is a NA is some records but not other:
df2 <- data.frame(id = c(1,2), num = c(1, NA), text = c("a", NA) )
dbWriteTable(con, "test2", value = df2, overwrite = TRUE )
# compute some extra variables
rs <- dbGetQuery(con, "SELECT *, num+1 AS new_num, text||'b' AS new_text FROM test2")
str(rs) 'data.frame': 2 obs. of 5 variables: $ id : num 1 2 $ num : num 1 NA $ text : chr "a" NA $ new_num : num 2 NA $ new_text: chr "ab" NA
Note that the extra variables have the expected types. However, the type is logical when the same operation is executed only on the second record:
# compute some extra variables when there missing data
rs <- dbGetQuery(con, "SELECT *, num+1 AS new_num, text||'b' AS new_text FROM test2 WHERE id=2")
str(rs) 'data.frame': 1 obs. of 5 variables: $ id : num 2 $ num : num NA $ text : chr NA $ new_num : logi NA $ new_text: logi NA
Note that this is different from R behaviour:
tmp <- df2[df2$id==2,]
tmp$new_num <- tmp$num+1
str(tmp) 'data.frame': 1 obs. of 4 variables: $ id : num 2 $ num : num NA $ text : chr NA $ new_num: num NA
I understand that there is a default toward logical when a new variable contains only NA but I think that it should be possible to obtain a column of the desired type even if it contains only missing data in one way or another.
The issue was closed by mistake.
Thanks. Simpler reprex:
library(RSQLite)
library(DBI)
con <- dbConnect(SQLite(), ":memory:")
df2 <- data.frame(id = c(1, 2), num = c(1, NA))
dbWriteTable(con, "test2", value = df2, overwrite = TRUE)
rs <- dbGetQuery(con, "SELECT *, num+1 AS new_num FROM test2 WHERE id=2")
str(rs)
#> 'data.frame': 1 obs. of 3 variables:
#> $ id : num 2
#> $ num : num NA
#> $ new_num: logi NA
Created on 2022-03-23 by the reprex package (v2.0.1)
Looks like the issue here is that sqlite doesn't advertise to us the type of the num + 1 expression. Data-wise all NULL values look identical (they come back as SQLITE_NULL). There seems little we can do here.
If you're free to choose the database engine, consider DuckDB as a type-stable replacement.