odbc icon indicating copy to clipboard operation
odbc copied to clipboard

Integers not supported in Snowflake

Open patrickhowerter opened this issue 2 years ago • 2 comments

Issue Description and Expected Result

R assumes that all integers are SQL_DECIMAL(38,0). Can integers in Snowflake be translated as integers in R and Big Integers in Snowflake translated as int64 in R?

Database

Snowflake

Reproducible Example

require(odbc)
#> Loading required package: odbc
#> Warning: package 'odbc' was built under R version 3.6.3
  
connString <- "driver=SnowflakeDSIIDriver;database=;
server=;
uid=;pwd=;
schema=;role=\"\";
warehouse="
connection = dbConnect(odbc(), .connection_string = connString)  
name <- 'MDO_CUSTOM_DATA.TEMPDB.TESTINT'
splitName <- strsplit(name, ".", fixed=TRUE)[[1]]
id <- DBI::Id(database = splitName[[1]], schema = splitName[[2]], table = splitName[[3]])
value <- data.frame(intval = 1L, numericval = 1.0)
dbWriteTable(connection, name=id, value=value, temporary=TRUE, overwrite=TRUE)  
dbGetQuery(conn=connection, sprintf("show columns in table %s", name))

#>   table_name schema_name column_name
#> 1    TESTINT      TEMPDB      intval
#> 2    TESTINT      TEMPDB  numericval
#>                                                   data_type null? default
#> 1 {"type":"FIXED","precision":38,"scale":0,"nullable":true}  true        
#> 2                           {"type":"REAL","nullable":true}  true        
#>     kind expression comment   database_name autoincrement
#> 1 COLUMN                    MDO_CUSTOM_DATA              
#> 2 COLUMN                    MDO_CUSTOM_DATA

devtools::session_info()
#> - Session info ---------------------------------------------------------------
#>  setting  value                       
#>  version  R version 3.6.2 (2019-12-12)
#>  os       Windows 10 x64              
#>  system   x86_64, mingw32             
#>  ui       RTerm                       
#>  language (EN)                        
#>  collate  English_United States.1252  
#>  ctype    English_United States.1252  
#>  tz       America/Chicago             
#>  date     2021-10-05                  
#> 
#> - Packages -------------------------------------------------------------------
#>  package     * version date       lib source        
#>  assertthat    0.2.1   2019-03-21 [1] CRAN (R 3.6.1)
#>  backports     1.1.5   2019-10-02 [1] CRAN (R 3.6.1)
#>  callr         3.7.0   2021-04-20 [1] CRAN (R 3.6.3)
#>  cli           3.0.1   2021-07-17 [1] CRAN (R 3.6.2)
#>  crayon        1.3.4   2017-09-16 [1] CRAN (R 3.6.1)
#>  desc          1.2.0   2018-05-01 [1] CRAN (R 3.6.1)
#>  devtools      2.3.2   2020-09-18 [1] CRAN (R 3.6.3)
#>  digest        0.6.23  2019-11-23 [1] CRAN (R 3.6.2)
#>  ellipsis      0.3.1   2020-05-15 [1] CRAN (R 3.6.3)
#>  evaluate      0.14    2019-05-28 [1] CRAN (R 3.6.1)
#>  fs            1.3.1   2019-05-06 [1] CRAN (R 3.6.1)
#>  glue          1.3.1   2019-03-12 [1] CRAN (R 3.6.1)
#>  highr         0.8     2019-03-20 [1] CRAN (R 3.6.1)
#>  htmltools     0.4.0   2019-10-04 [1] CRAN (R 3.6.2)
#>  knitr         1.27    2020-01-16 [1] CRAN (R 3.6.2)
#>  lifecycle     0.1.0   2019-08-01 [1] CRAN (R 3.6.2)
#>  magrittr      1.5     2014-11-22 [1] CRAN (R 3.6.1)
#>  memoise       1.1.0   2017-04-21 [1] CRAN (R 3.6.1)
#>  pkgbuild      1.2.0   2020-12-15 [1] CRAN (R 3.6.3)
#>  pkgload       1.1.0   2020-05-29 [1] CRAN (R 3.6.3)
#>  prettyunits   1.1.1   2020-01-24 [1] CRAN (R 3.6.2)
#>  processx      3.5.2   2021-04-30 [1] CRAN (R 3.6.3)
#>  ps            1.5.0   2020-12-05 [1] CRAN (R 3.6.3)
#>  purrr         0.3.3   2019-10-18 [1] CRAN (R 3.6.2)
#>  R6            2.4.1   2019-11-12 [1] CRAN (R 3.6.2)
#>  Rcpp          1.0.3   2019-11-08 [1] CRAN (R 3.6.2)
#>  remotes       2.2.0   2020-07-21 [1] CRAN (R 3.6.3)
#>  reprex        2.0.1   2021-08-05 [1] CRAN (R 3.6.2)
#>  rlang         0.4.9   2020-11-26 [1] CRAN (R 3.6.3)
#>  rmarkdown     2.1     2020-01-20 [1] CRAN (R 3.6.2)
#>  rprojroot     1.3-2   2018-01-03 [1] CRAN (R 3.6.1)
#>  rstudioapi    0.13    2020-11-12 [1] CRAN (R 3.6.3)
#>  sessioninfo   1.1.1   2018-11-05 [1] CRAN (R 3.6.1)
#>  stringi       1.4.4   2020-01-09 [1] CRAN (R 3.6.2)
#>  stringr       1.4.0   2019-02-10 [1] CRAN (R 3.6.1)
#>  testthat      3.0.1   2020-12-17 [1] CRAN (R 3.6.3)
#>  usethis       2.0.0   2020-12-10 [1] CRAN (R 3.6.3)
#>  withr         2.3.0   2020-09-22 [1] CRAN (R 3.6.3)
#>  xfun          0.12    2020-01-13 [1] CRAN (R 3.6.2)
#>  yaml          2.2.0   2018-07-25 [1] CRAN (R 3.6.2)
#> 
#> [1] C:/Users/patri/OneDrive/Documents/R/win-library/3.6
#> [2] C:/Program Files/R/R-3.6.2/library

patrickhowerter avatar Oct 05 '21 14:10 patrickhowerter

Looks like no one is working on anything tagged as snowflake. I am happy to help here.

Is this just some type of data type mapping exercise between R and Snowflake? If that's the case, I can provide the data type mappings, just let me know where I should add the code.

patrickhowerter avatar Nov 17 '21 15:11 patrickhowerter

Okay, I tried to make a fix with no success. The issue is not in the dbWriteTable method. In Snowflake, Integers and BigInt are stored as DECIMAL(38,0) in the database. The type DECIMAL(38,0) is assumed an Integer as you can see from the example below. Is there a way to automatically convert DECIMAL(38,0) to Integer in the dbGetQuery method?

This is an example of Snowflake assuming that DECIMAL(38,0) is Integer type.

CREATE OR REPLACE TEMPORARY TABLE "PUBLIC"."TESTINT" (
   INTVAL DECIMAL(38,0),
   INTVAL2 INTEGER
);
INSERT INTO  "PUBLIC"."TESTINT" VALUES(1, 1);
select is_integer(INTVAL), is_integer(INTVAL2) FROM "PUBLIC"."TESTINT"

image

patrickhowerter avatar Nov 30 '21 16:11 patrickhowerter

I'm able to reproduce your results, @patrickhowerter!

This does indeed seem to be an artifact of how Snowflake stores data:

Type Notes
NUMBER Default precision and scale are (38,0).
DECIMAL, NUMERIC Synonymous with NUMBER.
INT, INTEGER, BIGINT, SMALLINT, TINYINT, BYTEINT Synonymous with NUMBER except precision and scale cannot be specified.

odbc is able to write such that Snowflake interprets integers properly, but that information is obscured on read:

library(DBI)
library(odbc)

con <- dbConnect(odbc(), dsn = "Snowflake", pwd = Sys.getenv("snowflakePass"))

con
#> <OdbcConnection> Simon@Snowflake
#>   Snowflake Version: 7.44.2

# make a test tbl and write it to snowflake:
tbl <- data.frame(integ = 1L, numer = 1, decim = 1.1)

vapply(tbl, class, character(1))
#>     integ     numer     decim 
#> "integer" "numeric" "numeric"

dbWriteTable(con, "tbl", tbl, overwrite = TRUE)

# see if writing obscured the information needed to discern... it did not:
dbGetQuery(con,
           'select is_integer("integ"), is_integer("numer"), is_integer("decim")
            from "tbl";')
#>   IS_INTEGER("INTEG") IS_INTEGER("NUMER") IS_INTEGER("DECIM")
#> 1                TRUE               FALSE               FALSE

# on read, though:
vapply(dbReadTable(con, "tbl"), class, character(1))
#>     integ     numer     decim 
#> "numeric" "numeric" "numeric"

Created on 2023-12-21 with reprex v2.0.2

odbc defines the dbGetQuery() method under the hood in dbReadTable.DBIConnection, so we could have some sort of:

setMethod("dbGetQuery", c("Snowflake", "character"),
  function(conn,
           statement,
           n = -1,
           params = NULL,
           immediate = is.null(params),
           ...) {
    res <- callNextMethod(conn,
                          statement,
                          n = n,
                          params = params,
                          immediate = immediate,
                          ...)

    # correct col types
    
    res
  }
)

But the logic to figure out which columns to find types for—and how to find those types—for a generic query seems brittle. Perhaps odbc could get in the business of defining dbReadTable() methods, though I'm not sure that's in scope for the package.

simonpcouch avatar Dec 21 '23 16:12 simonpcouch

I think this is probably out of scope for odbc since we're relying on the driver to give us correct types. It's probably worth double checking that https://github.com/r-dbi/odbc/blob/main/src/odbc_result.cpp#L600-L683 is doing the right thing with the column types that we're getting from the snowflake driver, but I suspect this is either a snowflake or driver issue and there's nothing we can do here.

hadley avatar Dec 21 '23 18:12 hadley

Heard. From my (limited) understanding of odbc_result.cpp, I think there's not much we can do here; will go ahead and close.

simonpcouch avatar Dec 22 '23 21:12 simonpcouch