RSQLite
RSQLite copied to clipboard
CSV extension ignores schema's data type.
Thanks for adding the recent extensions that support CSV imports (eg, #407, #389).
I believe the data types passed to the schema argument aren't being respected. This is from the recent documentation (and I've added str(ds) to display the data types
library(DBI)
db <- dbConnect(RSQLite::SQLite())
RSQLite::initExtension(db, "csv")
# use the filename argument to mount CSV files from disk
sql <- paste0(
"CREATE VIRTUAL TABLE tbl USING ",
"csv(data='1,2', schema='CREATE TABLE x(a INT, b INT)')"
)
dbExecute(db, sql)
#> [1] 0
ds <- dbGetQuery(db, "SELECT * FROM tbl")
str(ds)
#> 'data.frame': 1 obs. of 2 variables:
#> $ a: chr "1"
#> $ b: chr "2"
Version:
RSQLite::rsqliteVersion()
#> header library
#> "3.38.0" "3.38.0"
Created on 2022-03-23 by the reprex package (v2.0.1)
Thanks. This appears to be a limitation of the csv extension:
https://github.com/r-dbi/RSQLite/blob/92330308af42feda5fc5a6a81f126def83b45efa/src/vendor/extensions/csv.c#L779
I suspect that type-converted output needs to call other functions: https://www.sqlite.org/c3ref/result_blob.html
I suspect that type-converted output needs to call other functions
I'm not understanding that content --are you saying something needs to be added to csv.c??
If there's not a reasonable approach within RSQLite, does it make sense to document that limitation and present a supplemental approach? Something like:
library(DBI)
db <- dbConnect(RSQLite::SQLite())
RSQLite::initExtension(db, "csv")
# use the filename argument to mount CSV files from disk
sql <- paste0(
"CREATE VIRTUAL TABLE tbl USING ",
"csv(data='1,b,3', schema='CREATE TABLE x(a TEXT, b TEXT, c TEXT)')"
)
dbExecute(db, sql)
#> [1] 0
ds <- dbGetQuery(db, "SELECT * FROM tbl")
str(ds)
#> 'data.frame': 1 obs. of 3 variables:
#> $ a: chr "1"
#> $ b: chr "b"
#> $ c: chr "3"
# The resulting data types are always characters,
# which essentially ignores the defined schema.
# One approach is to declare 'text' columns in the virtual table and use a
# function like `readr::type_convert()` to cast to the desired type.
# \dontrun{
col_types <-
readr::cols(
`a` = readr::col_integer(),
`b` = readr::col_character(),
`c` = readr::col_guess()
)
ds <- readr::type_convert(ds, col_types = col_types)
str(ds)
#> 'data.frame': 1 obs. of 3 variables:
#> $ a: int 1
#> $ b: chr "b"
#> $ c: num 3
# }
I don't think I can help much with the C interface, but please tell me if I can help with documentation or examples.
Created on 2022-03-25 by the reprex package (v2.0.1)
Thanks. Documenting the limitation will be helpful, until an update to csv.c becomes available.