RSQLite icon indicating copy to clipboard operation
RSQLite copied to clipboard

CSV extension ignores schema's data type.

Open wibeasley opened this issue 3 years ago • 3 comments

Thanks for adding the recent extensions that support CSV imports (eg, #407, #389).

I believe the data types passed to the schema argument aren't being respected. This is from the recent documentation (and I've added str(ds) to display the data types

library(DBI)
db <- dbConnect(RSQLite::SQLite())
RSQLite::initExtension(db, "csv")
# use the filename argument to mount CSV files from disk
sql <- paste0(
  "CREATE VIRTUAL TABLE tbl USING ",
  "csv(data='1,2', schema='CREATE TABLE x(a INT, b INT)')"
)
dbExecute(db, sql)
#> [1] 0
ds <- dbGetQuery(db, "SELECT * FROM tbl")


str(ds)
#> 'data.frame':    1 obs. of  2 variables:
#>  $ a: chr "1"
#>  $ b: chr "2"

Version:

RSQLite::rsqliteVersion()
#>   header  library 
#> "3.38.0" "3.38.0"

Created on 2022-03-23 by the reprex package (v2.0.1)

wibeasley avatar Mar 23 '22 23:03 wibeasley

Thanks. This appears to be a limitation of the csv extension:

https://github.com/r-dbi/RSQLite/blob/92330308af42feda5fc5a6a81f126def83b45efa/src/vendor/extensions/csv.c#L779

I suspect that type-converted output needs to call other functions: https://www.sqlite.org/c3ref/result_blob.html

krlmlr avatar Mar 25 '22 07:03 krlmlr

I suspect that type-converted output needs to call other functions

I'm not understanding that content --are you saying something needs to be added to csv.c??

If there's not a reasonable approach within RSQLite, does it make sense to document that limitation and present a supplemental approach? Something like:

library(DBI)
db <- dbConnect(RSQLite::SQLite())
RSQLite::initExtension(db, "csv")
# use the filename argument to mount CSV files from disk
sql <- paste0(
  "CREATE VIRTUAL TABLE tbl USING ",
  "csv(data='1,b,3', schema='CREATE TABLE x(a TEXT, b TEXT, c TEXT)')"
)
dbExecute(db, sql)
#> [1] 0

ds <- dbGetQuery(db, "SELECT * FROM tbl")
str(ds)
#> 'data.frame':    1 obs. of  3 variables:
#>  $ a: chr "1"
#>  $ b: chr "b"
#>  $ c: chr "3"

# The resulting data types are always characters,
#   which essentially ignores the defined schema.
#   One approach is to declare 'text' columns in the virtual table and use a
#   function like `readr::type_convert()` to cast to the desired type.
# \dontrun{
col_types <- 
  readr::cols(
    `a` = readr::col_integer(),
    `b` = readr::col_character(),
    `c` = readr::col_guess()
  )
ds <- readr::type_convert(ds, col_types = col_types)
str(ds)
#> 'data.frame':    1 obs. of  3 variables:
#>  $ a: int 1
#>  $ b: chr "b"
#>  $ c: num 3
# }

I don't think I can help much with the C interface, but please tell me if I can help with documentation or examples.

Created on 2022-03-25 by the reprex package (v2.0.1)

wibeasley avatar Mar 25 '22 15:03 wibeasley

Thanks. Documenting the limitation will be helpful, until an update to csv.c becomes available.

krlmlr avatar Mar 27 '22 09:03 krlmlr