RPostgres icon indicating copy to clipboard operation
RPostgres copied to clipboard

dbListTables: list Materialized Views as well

Open dpprdan opened this issue 5 years ago â€ĸ 14 comments

dbListTables() now uses the pg_class/pg_namespace tables instead of INFORMATION_SCHEMA.tables to retrieve the names of remote tables and (materialized) views.

In particular dbListTables() will now list:

  • r = ordinary table
  • v = view
  • m = materialized view
  • f = foreign table
  • p = partitioned table

It does not list the partitions of a partitioned table.

dpprdan avatar Sep 28 '20 12:09 dpprdan

There are no dbListTables() tests, correct?

dpprdan avatar Sep 28 '20 12:09 dpprdan

I'm getting legitimate Travis errors: https://travis-ci.com/github/r-dbi/RPostgres/jobs/391997882#L3128

krlmlr avatar Sep 28 '20 12:09 krlmlr

â˜šī¸ I can reproduce on a Postgres 9.6. Apparently, relispartition (as well as p = partitioned table) was added in v10 (pg_class docs v10 vs v9.6).

How do you want me to handle this?

  1. Drop relispartition i.e. list partitions as well.
  2. Make the query dependent on the postgres version, i.e. drop relispartition only for Postgres < v10.

Relatedly: Do you already have a policy regarding the use of information schema vs. system catalogs?

The information schema consists of a set of views that contain information about the objects defined in the current database. The information schema is defined in the SQL standard and can therefore be expected to be portable and remain stable — unlike the system catalogs, which are specific to PostgreSQL and are modeled after implementation concerns. The information schema views do not, however, contain information about PostgreSQL-specific features; to inquire about those you need to query the system catalogs or other PostgreSQL-specific views. https://www.postgresql.org/docs/current/information-schema.html (emphasis mine).

Since mviews are PostgreSQL-specific features we need to query the system catalogs. These have the drawback of being less stable, which we see here to some degree.

Therefore, alternative no.

  1. Do not support PostgreSQL-specific features and therefore mviews in dbListTables() (and dbListFields() and dbExistsTable()).

BTW Which Postgres versions does RPostgres support?

dpprdan avatar Sep 28 '20 15:09 dpprdan

A robust solution would be better. We require client libraries >= 9.6, the server version is not specified.

Are materialized views contained in INFORMATION_SCHEMA.views ?

krlmlr avatar Sep 28 '20 15:09 krlmlr

Are materialized views contained in INFORMATION_SCHEMA.views ?

No, they are not since they are a PostgreSQL-specific feature. For more info, see this thread on the psql-hackers list for a discussion on why they are not in the INFORMATION_SCHEMA.tables (which includes Views). Tl;dr: "They are not defined by the SQL standard."

My 2ct: Since RPostgres is a PostgreSQL-specific package, it makes sense to support PostgreSQL-specific features.

OTOH: Since it is a DBI package, maybe it might also make sense to only support SQL-standard features (like information schema)?

In this particular case I don't expect major changes in the future, but you never know.

We require client libraries >= 9.6, the server version is not specified.

Apparently my mental model is off. What do you mean by client libraries? I thought it goes like this: RPostgres - libpq - PostgreSQL server.

dpprdan avatar Sep 29 '20 07:09 dpprdan

We require libpq >= 9.6 .

I'm also worried about Redshift (#215). Can we implement a solution that works on all platforms?

krlmlr avatar Sep 29 '20 07:09 krlmlr

I am not familiar with Redshift, at all, and don't know how to test ATM.

DESCRIPTION says libpq >= 9.0, BTW.

dpprdan avatar Sep 29 '20 07:09 dpprdan

I found this:

The standard PostgreSQL catalog tables are accessible to Amazon Redshift users. For more information about PostgreSQL system catalogs, see PostgreSQL system tables.

This links to the Postgres v8.0 docs, however. Like I said, I am not familiar with Redshift, so I don't know whether this is an oversight or not.

I think we have a decision though: In order to support system catalogs, which are less stable than information schema, we would have to test against all currently supported server versions. If you want to support Redshift as well, we would need to test against all current Redshift versions as well. This is probably not as easy as spinning up a few docker containers (though I don't know for sure). This sounds very much like "let's not support queries against system catalogs in RPostgres" to me.

What do you think?

dpprdan avatar Sep 29 '20 08:09 dpprdan

Right, 9.0 is stated in DESCRIPTION and in ./configure .

I don't mind implementing Postgres-specific code now that we have Redshift() as a driver for which we can special-case (#258). I could set up a Redshift test database to check this, also on CI.

krlmlr avatar Sep 29 '20 08:09 krlmlr

We could find a middle ground by implementing Postgres-specific functions to gain some experience: Leave db*Table*() unchanged and implement Pq-specific functions with a distinct prefix.

krlmlr avatar Sep 29 '20 08:09 krlmlr

Sounds good, I think! Just to make sure, you mean:

INFORMATION_SCHEMA systems catalog
dbListTable() pqListTables()
dbListFields() pqListFields()
dbExistsTable() pqExistsTable()
dbListObjects() pqListObjects()

This involves a bit more than just changing a SQL query, though. E.g. I will have to read up on S4. Also we may have to think about how we setup internal functions like find_table(). Or rather I will have to figure out how this is designed elsewhere in the package, at least.

What I am saying is: I won't be able to do this immediately and cannot commit to a time frame at this moment. Also, I may require a bit of support from you. Therefore it's also fine with me, if you want to do this yourself. 😃

dpprdan avatar Oct 01 '20 09:10 dpprdan

Probably obvious, but we may need Redshift-specific functions as well, when a query relies on a Postgres function that Redshift does not support, e.g. https://github.com/r-dbi/RPostgres/issues/211. (Plus unsupported PG data types and features).

~~Not the best example, though, because this particular case seems to have been fixed in the meantime, since dbListTables() does not use the generate_subscripts function anymore.~~

https://github.com/r-dbi/RPostgres/blob/12c3d3dc67f93cbafca24b5496d2fc1a9cf0f562/R/tables.R#L237-L244

Edit: Not sure what I meant at the time, but looks like I was confused, because #211 is about using generate_subscripts (which Redshift does not support) in dbExitsTable(), while I was refering to dbListTables() here - which indeed doesn't use generate_subscripts, but probably never did. 🤷đŸģâ€â™‚ī¸

dpprdan avatar Oct 01 '20 10:10 dpprdan

I added new pq* versions of dbListTables(), dbExistsTable(), dbListObjects() and dbListFields(). They all query the system catalog (pg_class in particular) and are based on the same underlying code. Remember that we have to use the system catalog instead of information schema because materialized views are not included in information schema (because they are not in the SQL spec).

Is this, what you had in mind, @krlmlr?

At the minimum this needs some more tests to make sure that it does what we are trying to do, i.e. list materialized views as well.

We could find a middle ground by implementing Postgres-specific functions to gain some experience: Leave dbTable() unchanged and implement Pq-specific functions with a distinct prefix.

In the end I think it would be better to use this new code for the db* functions instead of having these "special" functions. One reason is ease of use, another is downstream functions (e.g. in dplyr), which build on the db* functions. We could still use information schema based code for connections that do not support (this) pg_class code, i.e. for Redshift connections.

Slightly off-topic: Do you have plans to test non-current PostgreSQL versions with R CMD check on GHA? I am asking because it was set up on Travis and caught an error I made.

dpprdan avatar Jul 17 '21 11:07 dpprdan

Thanks a lot. I'm not sure we need the new pq*() functions to be S4 methods, also this package uses the postgres prefix in similar situations.

Other than that, I think we can expose this via the db*() interface if the user sets a flag in dbConnect() -- similarly to the check_interrupts flag. Tests would be great too.

Added checks for Postgres >= 10 to the build matrix.

krlmlr avatar Dec 05 '21 07:12 krlmlr

Closing this in favour of #414

dpprdan avatar Dec 15 '22 21:12 dpprdan