odbc icon indicating copy to clipboard operation
odbc copied to clipboard

Excessive RAM memory consumption on Windows and Linux with dbWriteTable (std::bad_alloc on Windows)

Open aryoda opened this issue 3 years ago • 3 comments

Issue Description and Expected Result

On Windows with odbc version 1.3.2, DBI 1.1.1 and R 4.1.0 and also with R 3.63 as well as on Ubuntu with R 3.6.3 I can observe a huge memory leak in the Operating System's resource monitor.

Windows even throws "std::bad_alloc" in some cases with large data sizes

I would have expect buffered data transfers between the odbc client and database ("server") without noticeable increase of used OS RAM.

Instead I observe the bigger the data is the more OS RAM is allocated.

Screenshot from 2021-11-28 16-05-02

Impact

This problems makes it impossible to insert large data.frames into a DB table at once since the client runs out of memory (in my case eg. with large data.frame of 4 GB on a Windows machine with 16 GB RAM.

Possible workaround: I am currently investigating to blocks-of-rows-chunk the data manually and do separate inserts but if the memory leak persists until the R session is restarted this will not work reliably.

Databases

  • Microsoft SQL Server 2016 on Windows 10 (64 bit) with 16 GB RAM (tried with two different ODBC drivers: "SQL Server Native Client 11.0" and "ODBC Driver 17 for SQL Server")
  • sqlite3 using a DB file on Ubuntu 20.04 (64 bit) with 16 GB RAM
  • possibly all other databases

Error message and stack trace

On Windows in case of out-of-memory:

Error in result_insert_dataframe(rs@ptr, values, batch_rows) :
  std::bad_alloc
9. stop(structure(list(message = "std::bad_alloc", call = result_insert_dataframe(rs@ptr, values, batch_rows), cppstack = NULL), class = c("std::bad_alloc", "C++Error", "error", "condition")))
8. result_insert_dataframe(rs@ptr, values, batch_rows)
7. tryCatchList(expr, classes, parentenv, handlers)
6. tryCatch(result_insert_dataframe(rs@ptr, values, batch_rows),
         finally = dbClearResult(rs))
5. ... <hidden for privacy reasons>

Reproducible Example

library(DBI)
library(odbc)  # Version 1.3.2 is buggy (starting with version 1.3.0). Version 1.2.3 ist last known good

# devtools::install_github("r-dbi/odbc", ref = "v1.2.3")  # works without an error

# Uses sqlite3 as DB
#
# To install the required ODBC driver on Linux:
#   apt-get install libsqliteodbc
# To install the ODBC driver on Windows see:
#   http://www.ch-werner.de/sqliteodbc/

# 3 mio rows with about 1500 bytes per row = about 4 GB (less in R [80 MB] due to string deduplication in a string pool)
# Note: Use 7 * 1E6 rows on Windows with 16 GB RAM to get std::bad_alloc
data <- data.frame(ID = seq_len(3 * 1E6), name = paste(rep(LETTERS, 60), collapse = ""))

# Note: "~" in odbc conn string "database" property does not work so we create an absolute path here
db_file_FQN <- file.path(path.expand("~"), "bad_alloc_test_sqlite3.db")
table_name  <- "test_table"

# Delete existing DB file for a clean (re)start
if (file.exists(db_file_FQN)) file.remove(db_file_FQN)

con <- dbConnect(odbc::odbc(), driver="SQLite3", database = db_file_FQN) # Note: Use "SQLite3 ODBC Driver" as driver instead

gc()  # free mem before-hand

# !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
# Observe: This function call allocates a lot of RAM even though I would expect a fixed RAM buffer size for DB writes...
#          On Windows with 16 GB RAM you most probably get the error std::bad_alloc
#          (if not increase the number of data rows according to your RAM size)
#          The RAM is freed only if the R session is restarted!
odbc::dbWriteTable(con, table_name, data, append = TRUE)

gc() # free mem after query execution shows a lot of still allocated RAM on the machine but not within R

odbc::dbGetQuery(con, paste( "select count(*) from", table_name))

odbc::dbDisconnect(con)

gc() # no effect on OS RAM consumption

# Restart the R session to observe that this releases the allocated RAM again from the OS.

Session Info

Ubuntu 20.04:

> sessionInfo()
R version 3.6.3 (2020-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.3 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] odbc_1.3.2 DBI_1.1.0 

loaded via a namespace (and not attached):
 [1] bit_1.1-15.2      compiler_3.6.3    hms_0.5.3         tools_3.6.3       Rcpp_1.0.5        bit64_0.9-7       vctrs_0.3.6      
 [8] blob_1.2.1        pkgconfig_2.0.3   rlang_0.4.10.9000

Windows 10:

> sessionInfo()
R version 4.1.0 (2021-05-18)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)

Matrix products: default

locale:
[1] LC_COLLATE=German_Germany.1252  LC_CTYPE=German_Germany.1252    LC_MONETARY=German_Germany.1252
[4] LC_NUMERIC=C                    LC_TIME=German_Germany.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] odbc_1.3.2 DBI_1.1.1

loaded via a namespace (and not attached):
[1] Rcpp_1.0.7         lifecycle_1.0.0    backports_1.2.1    rlang_0.4.11       blob_1.2.1
 [7] checkmate_2.0.0    vctrs_0.3.8        ellipsis_0.3.2     tools_4.1.0        bit64_4.0.5
[13] bit_4.0.4          hms_1.1.0          compiler_4.1.0     pkgconfig_2.0.3

Last known good

Last known working version was odbc 1.2.3, version 1.3.0 introduced the problem for the first time. Using DBI 1.1.0 vs. 1.1.1 did not solve the problem.

See this flat OS memory history using the same example code from above with odbc 1.2.3 on Ubuntu 20.04:

Screenshot from 2021-11-28 16-54-27

> sessionInfo()
R version 3.6.3 (2020-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.3 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] odbc_1.2.3 DBI_1.1.0 

loaded via a namespace (and not attached):
 [1] bit_1.1-15.2      compiler_3.6.3    hms_0.5.3         tools_3.6.3       Rcpp_1.0.5        bit64_0.9-7       vctrs_0.3.6      
 [8] blob_1.2.1        pkgconfig_2.0.3   rlang_0.4.10.9000

Possibly related issues

#464 - current assessment: not related!

Possible next steps

  • Perform valgrind memory debugging (requires recompilation of odbc with -g and -O0 or -O1 option), eg.:

    R --debugger=valgrind --debugger-args=--leak-check=yes R -f bug.R

    See: https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Using-valgrind

aryoda avatar Nov 28 '21 16:11 aryoda

I have performed a first valgrind check on Ubuntu (using much less data [1E6 rows only] to speed-up the valgrind processing so don't be irritated if the memory leaks may be smaller here):

R version 3.6.3 (2020-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.3 LTS

See the results in the attached file:

R -d "valgrind --tool=memcheck --leak-check=full --show-leak-kinds=all --track-origins=yes -s" --vanilla -f bug_odbc_bad_alloc.R > R_output.txt 2> valgrind.txt

valgrind_odbc_1_3_2.txt

Edit: I have also added the valgrind output when using odbc 1.2.3 so a diff may help here:

valgrind_odbc_1_2_3.txt

aryoda avatar Nov 28 '21 17:11 aryoda

Hi - just looking at the NEWS file: one of the things that changed going from 1.2.3 -> 1.3 specific to insertion was the default value for thebatch_rows argument. Can you see if, with 1.3.x, and batch_rows equal to 1024 (and preferably R 3.6.x) you see similar behavior?

FWIW, I think there is a small leak that is specific to SQL Server (and then specific to using to Microsoft's driver, rather than FreeTDS); I believe it would manifest itself only during SELECTs (haven't had time to backport the patch from nanodbc). Quickly scanning your valgrid output, I don't think it's related to the issue you are describing.

detule avatar Nov 28 '21 18:11 detule

one of the things that changed going from 1.2.3 -> 1.3 specific to insertion was the default value for the batch_rows argument. Can you see if, with 1.3.x, and batch_rows equal to 1024 (and preferably R 3.6.x) you see similar behavior?

@detule Strike, excellent assessment :-)

NEWS for 1.3.0 says:

dbWriteTable() and dbBind() now default to a batch_rows of NA, which sets the batch size to be the length of the input. This avoids problems with drivers that don't support batch sizes larger than the input size. To restore the behavior prior to this release pass batch_rows = 1024 or set options(odbc.batch_rows = 1024) (#391).

If I add

options(odbc.batch_rows = 1024)

after the library statements to my above example code dbWriteTable works without flooding the RAM (RAM consumption is flat again):

Screenshot from 2021-11-29 00-14-08

I have also tested it with options(odbc.batch_rows = 2048) successfully, so the change documented in the NEWS file must be the reason.

I have tested this on Ubuntu 20.04 with R3.6.3 and sqlite3 only (give me two more days to check this work-around on my company's server infrastructure with SQL Server 2016 too)...

Proposal

  1. The recommended work-around for now is to set the odbc_batch_rows argument to 1024 /(or any other suitable value) again by default in the client code (not in odbc!) via:

    options(odbc.batch_rows = 1024)

  2. I propose to reset the default value from NA to 1024 and find a better fix for #391 instead of "just" maximizing the batch rows size since a) the memory impact is severe for larger data.frames. b) using a buffer of the same size as the data by default contradicts the idea of batch (buffered) writes c) changing the default values is a (semantical) API breaking change with impact on existing client code

Regarding the possibly related issue #464

Quickly scanning your valgrind output, I don't think it's related to the issue you are describing.

I agree with your assessment since setting the batch size to 1024 fixes my issue while #464 occurs during SQL selects which are not affected by this option.

Even though the error message is similar (std::bad_alloc) the reason may be different.

IMHO this still needs to be verified in detail when handling #464.

aryoda avatar Nov 28 '21 23:11 aryoda