dbListTables: list Materialized Views as well
dbListTables() now uses the pg_class/pg_namespace tables instead of INFORMATION_SCHEMA.tables to retrieve the names of remote tables and (materialized) views.
In particular dbListTables() will now list:
- r = ordinary table
- v = view
- m = materialized view
- f = foreign table
- p = partitioned table
It does not list the partitions of a partitioned table.
There are no dbListTables() tests, correct?
I'm getting legitimate Travis errors: https://travis-ci.com/github/r-dbi/RPostgres/jobs/391997882#L3128
âšī¸ I can reproduce on a Postgres 9.6. Apparently, relispartition (as well as p = partitioned table) was added in v10 (pg_class docs v10 vs v9.6).
How do you want me to handle this?
- Drop
relispartitioni.e. list partitions as well. - Make the query dependent on the postgres version, i.e. drop relispartition only for Postgres < v10.
Relatedly: Do you already have a policy regarding the use of information schema vs. system catalogs?
The information schema consists of a set of views that contain information about the objects defined in the current database. The information schema is defined in the SQL standard and can therefore be expected to be portable and remain stable â unlike the system catalogs, which are specific to PostgreSQL and are modeled after implementation concerns. The information schema views do not, however, contain information about PostgreSQL-specific features; to inquire about those you need to query the system catalogs or other PostgreSQL-specific views. https://www.postgresql.org/docs/current/information-schema.html (emphasis mine).
Since mviews are PostgreSQL-specific features we need to query the system catalogs. These have the drawback of being less stable, which we see here to some degree.
Therefore, alternative no.
- Do not support PostgreSQL-specific features and therefore mviews in
dbListTables()(anddbListFields()anddbExistsTable()).
BTW Which Postgres versions does RPostgres support?
A robust solution would be better. We require client libraries >= 9.6, the server version is not specified.
Are materialized views contained in INFORMATION_SCHEMA.views ?
Are materialized views contained in INFORMATION_SCHEMA.views ?
No, they are not since they are a PostgreSQL-specific feature. For more info, see this thread on the psql-hackers list for a discussion on why they are not in the INFORMATION_SCHEMA.tables (which includes Views). Tl;dr: "They are not defined by the SQL standard."
My 2ct: Since RPostgres is a PostgreSQL-specific package, it makes sense to support PostgreSQL-specific features.
OTOH: Since it is a DBI package, maybe it might also make sense to only support SQL-standard features (like information schema)?
In this particular case I don't expect major changes in the future, but you never know.
We require client libraries >= 9.6, the server version is not specified.
Apparently my mental model is off. What do you mean by client libraries? I thought it goes like this: RPostgres - libpq - PostgreSQL server.
We require libpq >= 9.6 .
I'm also worried about Redshift (#215). Can we implement a solution that works on all platforms?
I am not familiar with Redshift, at all, and don't know how to test ATM.
DESCRIPTION says libpq >= 9.0, BTW.
I found this:
The standard PostgreSQL catalog tables are accessible to Amazon Redshift users. For more information about PostgreSQL system catalogs, see PostgreSQL system tables.
This links to the Postgres v8.0 docs, however. Like I said, I am not familiar with Redshift, so I don't know whether this is an oversight or not.
I think we have a decision though: In order to support system catalogs, which are less stable than information schema, we would have to test against all currently supported server versions. If you want to support Redshift as well, we would need to test against all current Redshift versions as well. This is probably not as easy as spinning up a few docker containers (though I don't know for sure). This sounds very much like "let's not support queries against system catalogs in RPostgres" to me.
What do you think?
Right, 9.0 is stated in DESCRIPTION and in ./configure .
I don't mind implementing Postgres-specific code now that we have Redshift() as a driver for which we can special-case (#258). I could set up a Redshift test database to check this, also on CI.
We could find a middle ground by implementing Postgres-specific functions to gain some experience: Leave db*Table*() unchanged and implement Pq-specific functions with a distinct prefix.
Sounds good, I think! Just to make sure, you mean:
| INFORMATION_SCHEMA | systems catalog |
|---|---|
dbListTable() |
pqListTables() |
dbListFields() |
pqListFields() |
dbExistsTable() |
pqExistsTable() |
dbListObjects() |
pqListObjects() |
This involves a bit more than just changing a SQL query, though. E.g. I will have to read up on S4. Also we may have to think about how we setup internal functions like find_table(). Or rather I will have to figure out how this is designed elsewhere in the package, at least.
What I am saying is: I won't be able to do this immediately and cannot commit to a time frame at this moment. Also, I may require a bit of support from you. Therefore it's also fine with me, if you want to do this yourself. đ
Probably obvious, but we may need Redshift-specific functions as well, when a query relies on a Postgres function that Redshift does not support, e.g. https://github.com/r-dbi/RPostgres/issues/211. (Plus unsupported PG data types and features).
~~Not the best example, though, because this particular case seems to have been fixed in the meantime, since dbListTables() does not use the generate_subscripts function anymore.~~
https://github.com/r-dbi/RPostgres/blob/12c3d3dc67f93cbafca24b5496d2fc1a9cf0f562/R/tables.R#L237-L244
Edit: Not sure what I meant at the time, but looks like I was confused, because #211 is about using generate_subscripts (which Redshift does not support) in dbExitsTable(), while I was refering to dbListTables() here - which indeed doesn't use generate_subscripts, but probably never did. đ¤ˇđģââī¸
I added new pq* versions of dbListTables(), dbExistsTable(), dbListObjects() and dbListFields(). They all query the system catalog (pg_class in particular) and are based on the same underlying code. Remember that we have to use the system catalog instead of information schema because materialized views are not included in information schema (because they are not in the SQL spec).
Is this, what you had in mind, @krlmlr?
At the minimum this needs some more tests to make sure that it does what we are trying to do, i.e. list materialized views as well.
We could find a middle ground by implementing Postgres-specific functions to gain some experience: Leave dbTable() unchanged and implement Pq-specific functions with a distinct prefix.
In the end I think it would be better to use this new code for the db* functions instead of having these "special" functions. One reason is ease of use, another is downstream functions (e.g. in dplyr), which build on the db* functions. We could still use information schema based code for connections that do not support (this) pg_class code, i.e. for Redshift connections.
Slightly off-topic: Do you have plans to test non-current PostgreSQL versions with R CMD check on GHA? I am asking because it was set up on Travis and caught an error I made.
Thanks a lot. I'm not sure we need the new pq*() functions to be S4 methods, also this package uses the postgres prefix in similar situations.
Other than that, I think we can expose this via the db*() interface if the user sets a flag in dbConnect() -- similarly to the check_interrupts flag. Tests would be great too.
Added checks for Postgres >= 10 to the build matrix.
Closing this in favour of #414