dbQuoteIdentifier() does not allow columns in Id() object.
dbQuoteIdentifier() currently does not allow for columns when quoting an Id() object.
library(DBI)
con <- dbConnect(RPostgres::Postgres(), dbname = "postgres")
id <- DBI::Id(table = "table_name", column = "column_name")
dbQuoteIdentifier(conn = con, x = id)
#> Error in dbQuoteIdentifier(conn = con, x = id): all(names(x@name) %in% c("catalog", "schema", "table")) is not TRUE
I think it should, and the glue_sql() docs seem to agree (the last example contains an implizit dbQuoteIdentifier() if I am not mistaken).
cols <- list(
DBI::Id(table = "table_name", column = "col1"),
DBI::Id(table = "table_name", column = "col2")
)
glue::glue_sql("SELECT {`cols`*}", .con = con)
#> Error in FUN(X[[i]], ...): all(names(x@name) %in% c("catalog", "schema", "table")) is not TRUE
dbDisconnect(con)
Would it suffice to add "column" here:
https://github.com/r-dbi/RPostgres/blob/12c3d3dc67f93cbafca24b5496d2fc1a9cf0f562/R/quote.R#L58
and change
https://github.com/r-dbi/RPostgres/blob/12c3d3dc67f93cbafca24b5496d2fc1a9cf0f562/R/quote.R#L61-L71
to
return(SQL(paste0(dbQuoteIdentifier(conn, x@name), collapse = ".")))
(from https://github.com/r-dbi/DBI/blob/6531ab38fc0d21745b1f40fdd836ba82f926c287/R/quote.R#L112)
Or am I missing something?
Session info
devtools::session_info()
#> - Session info ---------------------------------------------------------------
#> setting value
#> version R version 4.0.2 (2020-06-22)
#> os Windows 10 x64
#> system x86_64, mingw32
#> ui RTerm
#> language en
#> collate German_Germany.1252
#> ctype German_Germany.1252
#> tz Europe/Berlin
#> date 2020-10-06
#>
#> - Packages -------------------------------------------------------------------
#> package * version date lib source
#> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.0.2)
#> backports 1.1.10 2020-09-15 [1] CRAN (R 4.0.2)
#> bit 4.0.4 2020-08-04 [1] CRAN (R 4.0.2)
#> bit64 4.0.5 2020-08-30 [1] CRAN (R 4.0.2)
#> blob 1.2.1 2020-01-20 [1] CRAN (R 4.0.2)
#> callr 3.4.4 2020-09-07 [1] CRAN (R 4.0.2)
#> cli 2.0.2 2020-02-28 [1] CRAN (R 4.0.2)
#> crayon 1.3.4 2017-09-16 [1] CRAN (R 4.0.2)
#> DBI * 1.1.0 2019-12-15 [1] CRAN (R 4.0.2)
#> desc 1.2.0 2018-05-01 [1] CRAN (R 4.0.2)
#> devtools 2.3.2 2020-09-18 [1] CRAN (R 4.0.2)
#> digest 0.6.25 2020-02-23 [1] CRAN (R 4.0.2)
#> ellipsis 0.3.1 2020-05-15 [1] CRAN (R 4.0.2)
#> evaluate 0.14 2019-05-28 [1] CRAN (R 4.0.2)
#> fansi 0.4.1 2020-01-08 [1] CRAN (R 4.0.2)
#> fs 1.5.0 2020-07-31 [1] CRAN (R 4.0.2)
#> glue 1.4.2 2020-08-27 [1] CRAN (R 4.0.2)
#> highr 0.8 2019-03-20 [1] CRAN (R 4.0.2)
#> hms 0.5.3 2020-01-08 [1] CRAN (R 4.0.2)
#> htmltools 0.5.0 2020-06-16 [1] CRAN (R 4.0.2)
#> knitr 1.30 2020-09-22 [1] CRAN (R 4.0.2)
#> magrittr 1.5.0.9000 2020-09-30 [1] Github (tidyverse/magrittr@0221e18)
#> memoise 1.1.0 2017-04-21 [1] CRAN (R 4.0.2)
#> pkgbuild 1.1.0 2020-07-13 [1] CRAN (R 4.0.2)
#> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.0.2)
#> pkgload 1.1.0 2020-05-29 [1] CRAN (R 4.0.2)
#> prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.0.2)
#> processx 3.4.4 2020-09-03 [1] CRAN (R 4.0.2)
#> ps 1.3.4 2020-08-11 [1] CRAN (R 4.0.2)
#> R6 2.4.1 2019-11-12 [1] CRAN (R 4.0.2)
#> Rcpp 1.0.5 2020-07-06 [1] CRAN (R 4.0.2)
#> remotes 2.2.0 2020-07-21 [1] CRAN (R 4.0.2)
#> rlang 0.4.7 2020-07-09 [1] CRAN (R 4.0.2)
#> rmarkdown 2.4 2020-09-30 [1] CRAN (R 4.0.2)
#> RPostgres 1.2.1.9000 2020-09-28 [1] local
#> rprojroot 1.3-2 2018-01-03 [1] CRAN (R 4.0.2)
#> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.0.2)
#> stringi 1.5.3 2020-09-09 [1] CRAN (R 4.0.2)
#> stringr 1.4.0 2019-02-10 [1] CRAN (R 4.0.2)
#> testthat 2.3.2 2020-03-02 [1] CRAN (R 4.0.2)
#> usethis 1.6.3 2020-09-17 [1] CRAN (R 4.0.2)
#> vctrs 0.3.4 2020-08-29 [1] CRAN (R 4.0.2)
#> withr 2.3.0 2020-09-22 [1] CRAN (R 4.0.2)
#> xfun 0.18 2020-09-29 [1] CRAN (R 4.0.2)
#> yaml 2.2.1 2020-02-01 [1] CRAN (R 4.0.0)
#>
#> [1] C:/Users/daniel/Documents/.R/win-library
#> [2] C:/Program Files/R/R-4.0.2/library
I missed dbUnquoteIdentifier() at least
library(DBI)
con <- dbConnect(RPostgres::Postgres(), dbname = "postgres")
id <- DBI::Id(table = "table_name", column = "column_name")
qid <- dbQuoteIdentifier(conn = con, x = id)
dbUnquoteIdentifier(conn = con, x = qid)
#> [[1]]
#> <Id> schema = table_name, table = column_name
dbDisconnect(con)
Is there any SQL dialect where another symbol than the dot is used as the delimiter between table (possibly fully qualified) and column name?
Not sure whether this question was directed at me, but FWIW, I am not aware of any, but I am not an expert.
How is this relevant for RPostgres? (Because here you'd only have to worry about the PostgreSQL dialect (plus derivatives like Redshift).)
The suggested changes in the first post are now here: https://github.com/r-dbi/RPostgres/compare/master...dpprdan:fix/col_in_dbQuoteIdentifier (slightly updated for correct element order).
There is one problem here: what should dbUnquoteIdentifier(con, "foo.bar") return? If we allow quoting columns, dbUnquoteIdentifier() is no longer an inverse to dbQuoteIdentifier() . Would that ever be a problem, or is it sufficient that dbQuoteIdentifier() is an inverse to dbUnquoteIdentifier() but not the other way round?
Maybe we really need dbQuoteColumn() and dbUnquoteColumn() which would assume that the last identifier is a column and pass the rest to db*Identifier() ? A default implementation could live in DBI (that motivated my other question if the column separator is always the dot across backends).
I no longer really like the Id() interface. Let's keep its scope to table identifiers.
I no longer really like the
Id()interface.
What's the alternative? Is there another way to make the glue_sql() example in the first post above work?
No good alternative indeed.
A better version of dbUnquoteIdentifier() would need an extra argument that identifies the object kind (schema, table, column) to which the object belongs. If we open dbQuoteIdentifier(), the roundtrip Id -> quote -> unquote is guaranteed only for table objects -- so be it.
I created a PR from your branch. Would you like to add a test?
If we open dbQuoteIdentifier(), the roundtrip Id -> quote -> unquote is guaranteed only for table objects – so be it.
Agree. I’d say that this is already the case, see below, and the fact that the roundtrip isn’t guaranteed for other objects doesn’t seem to have much practical relevance anyway apparently - judging from the lack of relevant bug reports at least.
Details
I’ve found some notes I started a while ago to understand what’s going on and I thought I’d leave them here to document where roundtrips (don’t) work currently.
The tl;dr of it is that some existing functionality relies on dbQuoteIdentifier()/dbUnquoteIdentifier() roundtrips, but doesn’t work in some situations already, either because it is a corner-case or because of some other functionality we rely on (e.g. to prevent double quoting).
Should
dbQuoteIdentifier()/dbUnquoteIdentifier()roundtrips be possible?
The docs say that they are possible, so they have to, see ?DBI::dbUnquoteIdentifier:
Call this method to convert a SQL object created by dbQuoteIdentifier() back to a list of Id objects.
However, the roundtrip is currently broken in other DBI(-compliant packages) for IDs that don’t have the table in the last slot.
library(DBI)
id_col <- Id(catalog = "mycatalog", schema = "myschema")
(id_col_a <- dbQuoteIdentifier(ANSI(), id_col))
#> <SQL> "mycatalog"."myschema"
dbUnquoteIdentifier(ANSI(), id_col_a)
#> [[1]]
#> <Id> schema = mycatalog, table = myschema
{RSQLite} behaves the same since it doesn’t have it’s own dbQuoteIdentifier() method for Id()s. This doesn’t have any practical relevance for RSQLite, since SQLite doesn’t support catalogs and schemas AFAIK.
The next line is an example from DBI::Id() and it cannot be unquoted, so no roundtrip here either ATM.
id4 <- Id(cluster = "mycluster", catalog = "mycatalog", schema = "myschema", table = "mytable")
(id4_a <- dbQuoteIdentifier(ANSI(), id4))
#> <SQL> "mycluster"."mycatalog"."myschema"."mytable"
dbUnquoteIdentifier(ANSI(), id4_a)
#> Error: Can't unquote "mycluster"."mycatalog"."myschema"."mytable"
The roundtrip also doesn’t work when just quoting a schema.
dbUnquoteIdentifier(ANSI(), dbQuoteIdentifier(ANSI(), Id(schema = "myschema")))[[1]]
#> <Id> table = myschema
This (formally) works with RPostgres
pg_con <- RPostgres::postgresDefault()
dbUnquoteIdentifier(pg_con, dbQuoteIdentifier(pg_con, Id(schema = "myschema")))[[1]]
#> <Id> schema = myschema
However, it only works because dbQuoteIdentifier() returns the quoted schema ID with a period (“.”) at the end, which isn’t a valid Postgres schema identifier. So this seems more like bug, not a feature.
(qid <- dbQuoteIdentifier(pg_con, Id(schema = "myschema")))
#> <SQL> "myschema".
create_schema_sql <- paste0("CREATE SCHEMA IF NOT EXISTS ", qid)
dbExecute(pg_con, create_schema_sql)
#> Error: Failed to prepare query: ERROR: syntax error at or near "."
#> LINE 1: CREATE SCHEMA IF NOT EXISTS "myschema".
#> ^
What should
dbUnquoteIdentifier(con, "foo.bar")return?
Apart from the fact that this would have to be dbUnquoteIdentifier(con, SQL("foo.bar")), because
an error is raised if plain character vectors are passed as the x argument. (From
?DBI::dbUnquoteIdentifier)
dbUnquoteIdentifier(ANSI(), "foo.bar")
#> Error: x must be SQL or Id
the docs are again quite clear:
unquoting expressions of the form SQL(“schema.table”) and then quoting gives the same result as quoting the identifier constructed by
Id(schema = "schema", table = "table"). (Also from?DBI::dbUnquoteIdentifier)
identical(
dbQuoteIdentifier(ANSI(), dbUnquoteIdentifier(ANSI(), SQL("foo.bar"))[[1]]),
dbQuoteIdentifier(ANSI(), Id(schema = "foo", table = "bar"))
)
#> [1] TRUE
The implicit assumption being that foo.bar is supposed to mean schema = "foo", table = "bar" and not table = "foo.bar"
If we allow quoting columns,
dbUnquoteIdentifier()is no longer an inverse todbQuoteIdentifier(). Would that ever be a problem, or is it sufficient thatdbQuoteIdentifier()is an inverse todbUnquoteIdentifier()but not the other way round?
Quoting than unquoting a tablename is used currently to get a table id from a string, SQL() or Id() object (e.g. in dbListFields), so that needs to work.
Not sure whether there is a use-case for unquoting and then quoting again. Be that as it may, it would still be possible to unquote-quote something like SQL("foo.bar") if we allowed columns. We just don’t know whether bar is a schema, table, column or other database object and just have to assume - like we already do currently - that it is a table object.
Finally note that for SQL("myschema.mytable") a quote-unquote-quote roundtrip is not identical to just quoting it.
dbQuoteIdentifier(ANSI(), SQL("myschema.mytable"))
#> <SQL> myschema.mytable
dbQuoteIdentifier(ANSI(), dbUnquoteIdentifier(ANSI(), dbQuoteIdentifier(ANSI(), SQL("myschema.mytable")))[[1]])
#> <SQL> "myschema"."mytable"
This is to prohibit double quoting a SQL string, of course, so again the lack of an exact roundtrip seems to be necessary.
dbDisconnect(pg_con)
Session info
sessioninfo::session_info()
#> - Session info ---------------------------------------------------------------
#> setting value
#> version R version 4.1.2 (2021-11-01)
#> os Windows 10 x64 (build 19043)
#> system x86_64, mingw32
#> ui RTerm
#> language en
#> collate German_Germany.1252
#> ctype German_Germany.1252
#> tz Europe/Berlin
#> date 2021-12-28
#> pandoc 2.16.2 @ C:/PROGRA~1/Pandoc/ (via rmarkdown)
#>
#> - Packages -------------------------------------------------------------------
#> package * version date (UTC) lib source
#> backports 1.4.1 2021-12-13 [1] CRAN (R 4.1.2)
#> bit 4.0.4 2020-08-04 [1] CRAN (R 4.1.0)
#> bit64 4.0.5 2020-08-30 [1] CRAN (R 4.1.0)
#> blob 1.2.2 2021-07-23 [1] CRAN (R 4.1.0)
#> cli 3.1.0 2021-10-27 [1] CRAN (R 4.1.1)
#> crayon 1.4.2 2021-10-29 [1] CRAN (R 4.1.1)
#> DBI * 1.1.2 2021-12-20 [1] CRAN (R 4.1.2)
#> digest 0.6.29 2021-12-01 [1] CRAN (R 4.1.2)
#> ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.1.0)
#> evaluate 0.14 2019-05-28 [1] CRAN (R 4.1.0)
#> fansi 0.5.0 2021-05-25 [1] CRAN (R 4.1.0)
#> fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.1.0)
#> fs 1.5.2 2021-12-08 [1] CRAN (R 4.1.2)
#> generics 0.1.1 2021-10-25 [1] CRAN (R 4.1.1)
#> glue 1.6.0 2021-12-17 [1] CRAN (R 4.1.2)
#> highr 0.9 2021-04-16 [1] CRAN (R 4.1.0)
#> hms 1.1.1 2021-09-26 [1] CRAN (R 4.1.1)
#> htmltools 0.5.2 2021-08-25 [1] CRAN (R 4.1.1)
#> knitr 1.37 2021-12-16 [1] CRAN (R 4.1.2)
#> lifecycle 1.0.1 2021-09-24 [1] CRAN (R 4.1.1)
#> lubridate 1.8.0 2021-10-07 [1] CRAN (R 4.1.1)
#> magrittr 2.0.1 2020-11-17 [1] CRAN (R 4.1.0)
#> pillar 1.6.4 2021-10-18 [1] CRAN (R 4.1.1)
#> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.1.0)
#> purrr 0.3.4 2020-04-17 [1] CRAN (R 4.1.0)
#> R.cache 0.15.0 2021-04-30 [1] CRAN (R 4.1.0)
#> R.methodsS3 1.8.1 2020-08-26 [1] CRAN (R 4.1.0)
#> R.oo 1.24.0 2020-08-26 [1] CRAN (R 4.1.0)
#> R.utils 2.11.0 2021-09-26 [1] CRAN (R 4.1.1)
#> Rcpp 1.0.7 2021-07-07 [1] CRAN (R 4.1.0)
#> reprex 2.0.1 2021-08-05 [1] CRAN (R 4.1.0)
#> rlang 0.4.12 2021-10-18 [1] CRAN (R 4.1.1)
#> rmarkdown 2.11 2021-09-14 [1] CRAN (R 4.1.1)
#> RPostgres 1.4.3 2021-12-20 [1] CRAN (R 4.1.2)
#> rstudioapi 0.13 2020-11-12 [1] CRAN (R 4.1.0)
#> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.1.2)
#> stringi 1.7.6 2021-11-29 [1] CRAN (R 4.1.2)
#> stringr 1.4.0 2019-02-10 [1] CRAN (R 4.1.0)
#> styler 1.6.2 2021-09-23 [1] CRAN (R 4.1.1)
#> tibble 3.1.6 2021-11-07 [1] CRAN (R 4.1.2)
#> utf8 1.2.2 2021-07-24 [1] CRAN (R 4.1.0)
#> vctrs 0.3.8 2021-04-29 [1] CRAN (R 4.1.0)
#> withr 2.4.3 2021-11-30 [1] CRAN (R 4.1.2)
#> xfun 0.29 2021-12-14 [1] CRAN (R 4.1.2)
#> yaml 2.2.1 2020-02-01 [1] CRAN (R 4.1.0)
#>
#> [1] C:/Users/Daniel.AK-HAMBURG/Documents/R/win-library/4.1
#> [2] C:/Program Files/R/R-4.1.2/library
#>
#> ------------------------------------------------------------------------------
Thanks for the detailed write-up.
The behavior with the terminal dot in the identifier name looks like a fluke to me too.
We should remember to document that dbUnquoteIdentifier() is only meant for tables, not for objects of other kinds.