RPostgres
RPostgres copied to clipboard
dbQuoteIdentifier: allow columns
Closes #263.
@dpprdan: Checks are failing now, could you please take a look?
👍 I'll look into it over the holidays.
The checks fail here
With the current CRAN release and an empty database, this looks like this:
library(RPostgres)
con <- postgresDefault()
objects <- dbListObjects(con)
(sql <- lapply(objects$table, dbQuoteIdentifier, conn = con))
#> [[1]]
#> <SQL> "information_schema".
#>
#> [[2]]
#> <SQL> "pg_catalog".
(unquoted <- vapply(sql, dbUnquoteIdentifier, conn = con, list(1)))
#> [[1]]
#> <Id> schema = information_schema
#>
#> [[2]]
#> <Id> schema = pg_catalog
testthat::expect_equal(unquoted, unclass(objects$table))
Note however the period (“.”) behind the quoted schemata.
sql[[1]]
#> <SQL> "information_schema".
This isn’t a valid schema identifier in Postgres.
(qid <- dbQuoteIdentifier(con, Id(schema = "myschema")))
#> <SQL> "myschema".
create_schema_sql <- paste0("CREATE SCHEMA IF NOT EXISTS ", qid)
dbExecute(con, create_schema_sql)
#> Error: Failed to prepare query: ERROR: syntax error at or near "."
#> LINE 1: CREATE SCHEMA IF NOT EXISTS "myschema".
#> ^
So it seems to me that the test is broken. I don’t think that it can be fixed for schemata either, because the quote-unquote roundtrip cannot be made to work for just schemata. The heuristics employed in dbUnquoteIdentifier()
to identify the database objects from a SQL()
string can only work for one object class and we chose that to be tables.
The test can probably be fixed for tables if we write a table to the db and then test whether the quote-unquote roundtrip works for that (i.e. only for the objects[!objects$is_prefix]
subset). But maybe this test exists already elsewhere?
dbDisconnect(con)
Session info
sessioninfo::session_info()
#> - Session info ---------------------------------------------------------------
#> setting value
#> version R version 4.1.2 (2021-11-01)
#> os Windows 10 x64 (build 19043)
#> system x86_64, mingw32
#> ui RTerm
#> language en
#> collate German_Germany.1252
#> ctype German_Germany.1252
#> tz Europe/Berlin
#> date 2021-12-28
#> pandoc 2.16.2 @ C:/PROGRA~1/Pandoc/ (via rmarkdown)
#>
#> - Packages -------------------------------------------------------------------
#> package * version date (UTC) lib source
#> backports 1.4.1 2021-12-13 [1] CRAN (R 4.1.2)
#> bit 4.0.4 2020-08-04 [1] CRAN (R 4.1.0)
#> bit64 4.0.5 2020-08-30 [1] CRAN (R 4.1.0)
#> blob 1.2.2 2021-07-23 [1] CRAN (R 4.1.0)
#> cli 3.1.0 2021-10-27 [1] CRAN (R 4.1.1)
#> crayon 1.4.2 2021-10-29 [1] CRAN (R 4.1.1)
#> DBI 1.1.2 2021-12-20 [1] CRAN (R 4.1.2)
#> desc 1.4.0 2021-09-28 [1] CRAN (R 4.1.1)
#> digest 0.6.29 2021-12-01 [1] CRAN (R 4.1.2)
#> ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.1.0)
#> evaluate 0.14 2019-05-28 [1] CRAN (R 4.1.0)
#> fansi 0.5.0 2021-05-25 [1] CRAN (R 4.1.0)
#> fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.1.0)
#> fs 1.5.2 2021-12-08 [1] CRAN (R 4.1.2)
#> generics 0.1.1 2021-10-25 [1] CRAN (R 4.1.1)
#> glue 1.6.0 2021-12-17 [1] CRAN (R 4.1.2)
#> highr 0.9 2021-04-16 [1] CRAN (R 4.1.0)
#> hms 1.1.1 2021-09-26 [1] CRAN (R 4.1.1)
#> htmltools 0.5.2 2021-08-25 [1] CRAN (R 4.1.1)
#> knitr 1.37 2021-12-16 [1] CRAN (R 4.1.2)
#> lifecycle 1.0.1 2021-09-24 [1] CRAN (R 4.1.1)
#> lubridate 1.8.0 2021-10-07 [1] CRAN (R 4.1.1)
#> magrittr 2.0.1 2020-11-17 [1] CRAN (R 4.1.0)
#> pillar 1.6.4 2021-10-18 [1] CRAN (R 4.1.1)
#> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.1.0)
#> pkgload 1.2.4 2021-11-30 [1] CRAN (R 4.1.2)
#> purrr 0.3.4 2020-04-17 [1] CRAN (R 4.1.0)
#> R.cache 0.15.0 2021-04-30 [1] CRAN (R 4.1.0)
#> R.methodsS3 1.8.1 2020-08-26 [1] CRAN (R 4.1.0)
#> R.oo 1.24.0 2020-08-26 [1] CRAN (R 4.1.0)
#> R.utils 2.11.0 2021-09-26 [1] CRAN (R 4.1.1)
#> R6 2.5.1 2021-08-19 [1] CRAN (R 4.1.1)
#> Rcpp 1.0.7 2021-07-07 [1] CRAN (R 4.1.0)
#> reprex 2.0.1 2021-08-05 [1] CRAN (R 4.1.0)
#> rlang 0.4.12 2021-10-18 [1] CRAN (R 4.1.1)
#> rmarkdown 2.11 2021-09-14 [1] CRAN (R 4.1.1)
#> RPostgres * 1.4.3 2021-12-20 [1] CRAN (R 4.1.2)
#> rprojroot 2.0.2 2020-11-15 [1] CRAN (R 4.1.0)
#> rstudioapi 0.13 2020-11-12 [1] CRAN (R 4.1.0)
#> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.1.2)
#> stringi 1.7.6 2021-11-29 [1] CRAN (R 4.1.2)
#> stringr 1.4.0 2019-02-10 [1] CRAN (R 4.1.0)
#> styler 1.6.2 2021-09-23 [1] CRAN (R 4.1.1)
#> testthat 3.1.1 2021-12-03 [1] CRAN (R 4.1.2)
#> tibble 3.1.6 2021-11-07 [1] CRAN (R 4.1.2)
#> utf8 1.2.2 2021-07-24 [1] CRAN (R 4.1.0)
#> vctrs 0.3.8 2021-04-29 [1] CRAN (R 4.1.0)
#> withr 2.4.3 2021-11-30 [1] CRAN (R 4.1.2)
#> xfun 0.29 2021-12-14 [1] CRAN (R 4.1.2)
#> yaml 2.2.1 2020-02-01 [1] CRAN (R 4.1.0)
#>
#> [1] C:/Users/Daniel.AK-HAMBURG/Documents/R/win-library/4.1
#> [2] C:/Program Files/R/R-4.1.2/library
#>
#> ------------------------------------------------------------------------------
Thanks. I need to take a look at the workflows.
Can you please bring in the latest changes from main to fix the workflows?
The test can probably be fixed for tables if we write a table to the db and then test whether the quote-unquote roundtrip works for that (i.e. only for the
objects[!objects$is_prefix]
subset). But maybe this test exists already elsewhere?
😬 Ha, I guess I should've just read the rest of the file. 🤦🏻♂️
BTW From reading the code, I'd say that RMariaDB has the same problem(s) (both the "cannot quote columns" and the "terminal dot" issue).
Current status: trying to get RMariaDB::mariadbDefault()
to run with a MariaDB Docker container on Windows.
Sure enough I found a case where we need to unquote a schema only, namely in dbListObjects()
with a prefix
.
So this PR unfortunately breaks dbListObjects()
with a SQL()
-type prefix
at the moment.
library(RPostgres)
con <- postgresDefault()
schema_qid <- dbQuoteIdentifier(con, Id(schema = 'Robert'))
The following doesn’t work with v1.4.3, due to the terminal dot
dbExecute(con, paste0('CREATE SCHEMA IF NOT EXISTS ', schema_qid))
#> [1] 0
We need to add a table for dbQuoteIdentifier()
to find the schema, see #388
table_qid <- dbQuoteIdentifier(con, Id(schema = 'Robert', table = "mtcars"))
dbWriteTable(con, table_qid, mtcars)
The following works in v1.4.3 (but relies on the terminal dot to unquote the prefix
as schema, not table)
dbListObjects(con, schema_qid)
#> Error in mapply(FUN = f, ..., SIMPLIFY = FALSE): zero-length inputs cannot be mixed with those of non-zero length
It works with an Id()
object (which just passes through dbUnquoteIdentifier()
).
dbListObjects(con, Id(schema = 'Robert'))
#> table is_prefix
#> 1 <Id> schema = Robert, table = mtcars FALSE
# clean-up
dbExecute(con, paste0('DROP TABLE ', table_qid))
#> [1] 0
dbExecute(con, paste0('DROP SCHEMA "Robert"'))
#> [1] 0
dbDisconnect(con)
I don’t know how to fix this ATM.
Thanks. It looks to me like dbListObjects()
should always be called with an Id()
, never with a SQL()
. The behavior is misspecified. Should we adapt the list_objects_features
test?
It looks to me like
dbListObjects()
should always be called with anId()
, never with aSQL()
.
That would be an easy fix, indeed, and way cleaner than all the hacks I came up with until now. I hope we're not missing something.
Can anything else than a schema be passed to prefix
?
https://github.com/r-dbi/RPostgres/blob/5eced0cce8e232222e1f6274fade002e939a174d/R/dbListObjects_PqConnection_ANY.R#L22-L26
Because this looks like one could also specify (schema-qualified) table(s) ~~(and then only the schema is used), but I don't understand the use case that prompted this design.~~ Update: I misread the code. This is to only select schemas, not (possibly schema-qualified) tables. So I guess the answer is that schemata and tables can be passed to prefix, but only schemas are used.
It also looks like it is supposed to be vectorized (v*apply()
), but DBI::dbQuoteIdentifier()
isn't AFAIU (x must be SQL or Id
), so dbListObjects()
isn't, at least not with a prefix
.
It looks like
dbQuoteIdentifier()
with incompleteId()
(e.g., only "catalog" and "schema") used to leave a trailing dot in the identifier string, but no longer does.
It still does in RPostgres:
https://github.com/r-dbi/RPostgres/blob/fa3466fafa777a4b6da990595d396a056ba6c45d/R/dbQuoteIdentifier_PqConnection_Id.R#L12
And in RMariaDB.
It's not present in DBI (and never was IIUC, at least not since https://github.com/r-dbi/DBI/commit/6c111b7f49632b03eff5f934ec9691f57f5f9702)
library(DBI)
dbQuoteIdentifier(ANSI(), Id(schema = "myschema"))
#> <SQL> "myschema"
This PR includes a potential fix for RPostgres
https://github.com/r-dbi/RPostgres/pull/372/files#diff-b87396211065ce77f8649cc9f6daadb09f90ee9f7d3cfe4cd4b8dfdc377f616fR8-R9
It is all a bit confusing because the "(can't) quote columns" and "trailing dot" issues and PRs are conflated here (as well as over at RMariaDB). Maybe it's best to separate those from each other?
As the tests don't fail, I assume this isn't tested anywhere 😱 .
My guess would be that the implicit assumption has always been that IDs contain/refer to a table name.
We could go for a breaking change
Not sure I follow. Breaking change with regard to the trailing dot? AFAICT we're not breaking something that used to work before in RPostgres or RMariaDB. So this would merely be a bugfix?
but a DBItest specification and test would be great.
I'll have a look.
In any case, adding support for columns as quoted identifiers is not a breaking change, but rather in accordance with the DBItest spec 👀
Quoted identifiers can be used as table and column names in SQL queries (source)
but a DBItest specification and test would be great.
There aren't any quote-identifier specs and tests at the moment and I am having a hard time coming up with meaningful ones for this problem.
- We don't want to do (
SQL
) string comparisons, because quoting is implementation-specific and therefore belong in the particular backend packages. - We cannot use the quote-unquote Roundtrip for
Id()
s that do not reference tables. - The trailing dot is a particular bug, but not spec-relevant per se. Even though I am not aware of a DB that uses something other than the dot as a object name separator (cf. MariaDB, Postgres), but in principle it could be implementation-specific, I guess.
So I'd say that the tests belong in the backend packages rather than in DBItest. Am I missing something?
The MariaDB docs mention many more identifiers (and a similar list applies to Postgres as well - MariaDB is missing schema
for example):
Databases, tables, indexes, columns, aliases, views, stored routines, triggers, events, variables, partitions, tablespaces, savepoints, labels, users, roles, are collectively known as identifiers
Shouldn't we support/allow those as well?
Can you please merge the latest changes here? Looks like I'm not permitted to do this in this PR.
Current Aviator status
Aviator will automatically update this comment as the status of the PR changes. Comment
/aviator refresh
to force Aviator to re-examine your PR (or learn about other/aviator
commands).
This PR was merged manually (without Aviator). Merging manually can negatively impact the performance of the queue. Consider using Aviator next time.
Fixes #453 too. You may want to adjust the title and description, @krlmlr (since you're the PR owner).
Thanks. This now has been implemented independently, but the tests are good.
This pull request failed to merge: PR cannot be automatically rebased, please rebase manually to continue. Once the issues are resolved, remove the blocked
label and re-queue the pull request. Note that the pull request will be automatically re-queued if it has the mergequeue
label.
Additional debug info: Failed to rebase this PR onto the latest changes from the base branch. You will probably need to rebase this PR manually and resolve conflicts).