duckdb-r
duckdb-r copied to clipboard
add `temporary` parameter to `duckdb_read_csv()`
It is currently not possible to write a temporary table using duckdb_read_csv()
directly.
To fix this, can we expose the temporary
parameter in duckdb_read_csv()
:
https://github.com/duckdb/duckdb-r/blob/b13d4050e3096c38a699719a2b5e89fc65106765/R/csv.R#L32-L33
and pass it to the call to dbWriteTable()
?
https://github.com/duckdb/duckdb-r/blob/b13d4050e3096c38a699719a2b5e89fc65106765/R/csv.R#L70
For tables that fit in memory, we can use:
mytable <- read.csv2("mytable.csv")
dbWriteTable(con, "mytable", mytable, temporary = TRUE)
But we are stuck for tables that don't fit in memory.
Thanks.
- The
dbWriteTable()
you mentioned really should be adbCreateTable()
, this had confused me a bit - Can you confirm that running
dbExecute("COPY ... FROM ...")
does what you expect it to for temporary tables? What happens with name clashes (same table exists as permanent and as temporary table)? - For now, you could work around:
- Import the CSV data into a temporary database that you connect to your main database with
ATTACH
- Use something like
CREATE TEMPORARY TABLE ... AS read_csv_auto('...')
(not tested)
- Import the CSV data into a temporary database that you connect to your main database with
Prepare files and db for testing
library(DBI)
write.csv(warpbreaks, "warpbreaks.csv", row.names = FALSE)
write.csv(cars, "cars.csv", row.names = FALSE)
con <- dbConnect(duckdb::duckdb(), dbdir = "test.duckdb")
duckdb::duckdb_read_csv(con, "warpbreaks1", "warpbreaks.csv")
dbListTables(con)
# [1] "warpbreaks1"
dbDisconnect(con, shutdown = TRUE)
- Can you confirm that running
dbExecute("COPY ... FROM ...")
does what you expect it to for temporary tables? (...)
Yes, it does:
con <- dbConnect(duckdb::duckdb(), dbdir = "test.duckdb", read_only = TRUE)
dbListTables(con)
# [1] "warpbreaks1"
dbExecute(con, "CREATE TEMPORARY TABLE warpbreaks2 (breaks NUMERIC, wool VARCHAR, tension VARCHAR);")
# [1] 0
dbExecute(con, "COPY warpbreaks2 FROM 'warpbreaks.csv' (HEADER);")
# [1] 54
dbListTables(con)
# [1] "warpbreaks1" "warpbreaks2"
dbDisconnect(con, shutdown = TRUE)
# check that the temporary table is gone
con <- dbConnect(duckdb::duckdb(), dbdir = "test.duckdb")
dbListTables(con)
# [1] "warpbreaks1"
dbDisconnect(con, shutdown = TRUE)
- (...) What happens with name clashes (same table exists as permanent and as temporary table)?
Not sure what is expected here. Here is a test:
con <- dbConnect(duckdb::duckdb(), dbdir = "test.duckdb", read_only = TRUE)
dbListTables(con)
# [1] "warpbreaks1"
dbExecute(con, "CREATE TEMPORARY TABLE warpbreaks1 (speed NUMERIC, dist NUMERIC);")
# [1] 0
dbExecute(con, "COPY warpbreaks1 FROM 'cars.csv' (HEADER);")
# [1] 50
dbListTables(con)
# [1] "warpbreaks1" "warpbreaks1"
dbReadTable(con, "warpbreaks1") |> head()
# speed dist
# 1 4 2
# 2 4 10
# 3 7 4
# 4 7 22
# 5 8 16
# 6 9 10
dbDisconnect(con, shutdown = TRUE)
For now, you could work around:
- Import the CSV data into a temporary database that you connect to your main database with
ATTACH
- Use something like
CREATE TEMPORARY TABLE ... AS read_csv_auto('...')
(not tested)
It works, thanks! (documentation here)
con <- dbConnect(duckdb::duckdb(), dbdir = "test.duckdb", read_only = TRUE)
dbListTables(con)
# [1] "warpbreaks1"
dbExecute(con, "CREATE TEMPORARY TABLE warpbreaks3 AS FROM 'warpbreaks.csv';")
# [1] 54
dbListTables(con)
# [1] "warpbreaks1" "warpbreaks3"
dbDisconnect(con, shutdown = TRUE)
Temporary tables take precedence indeed if they exist:
library(duckdb)
#> Loading required package: DBI
con <- dbConnect(duckdb())
writeLines("a\n1", "x.csv")
dbExecute(con, "CREATE TEMPORARY TABLE x (a INTEGER)")
#> [1] 0
dbExecute(con, "CREATE TABLE x (y VARCHAR)")
#> [1] 0
dbExecute(con, "COPY x FROM 'x.csv' (HEADER)")
#> [1] 1
dbReadTable(con, "x")
#> a
#> 1 1
dbRemoveTable(con, "x", temporary = TRUE)
dbReadTable(con, "x")
#> [1] y
#> <0 rows> (or 0-length row.names)
Created on 2024-08-17 with reprex v2.1.0
We want to do the following here:
- In
duckdb_read_csv()
, changedbWriteTable()
todbCreateTable()
- Add an ellipsis after
files
to make sure arguments are named - Add a
temporary = FALSE
argument and modify the SQL as needed
My preferred workflow is to use dbplyr with tbl_function()
, but I'm open to reviewing a PR.