postgres_scanner icon indicating copy to clipboard operation
postgres_scanner copied to clipboard

Please add support for AWS Redshift (based on PostgreSQL 8.0.2)

Open CedricYauLBD opened this issue 1 year ago • 14 comments

What happens?

When I try to use the postgres extension with AWS Redshift I get an error message:

Error: Invalid Error: Failed to execute query "SELECT CURRENT_SETTING('server_version'), (SELECT COUNT(*) FROM pg_settings WHERE name LIKE 'rds%')": ERROR:  must be superuser to examine "server_version"

Based on https://stackoverflow.com/questions/51422236/what-role-permission-needed-for-the-user-to-get-the-server-version-in-amazon-red

I can use SELECT version(); on Redshift.

cyau@redshift:dev> SELECT version();
+---------------------------------------------------------------------------------------------------------------------------+
| version                                                                                                                   |
|---------------------------------------------------------------------------------------------------------------------------|
| PostgreSQL 8.0.2 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.4.2 20041017 (Red Hat 3.4.2-6.fc3), Redshift 1.0.62878 |
+---------------------------------------------------------------------------------------------------------------------------+
SELECT 1
Time: 0.043s

This appears to be caused by the query here: https://github.com/duckdb/postgres_scanner/blob/883a8f1a8a487264855a5166f7df1f46ad386434/src/postgres_connection.cpp#L122

To Reproduce

v0.10.0 20b1486d11
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
D attach 'dbname=mydb user=username host=redshift.server.address password=password port=5439' as db (TYPE postgres);
Error: Invalid Error: Failed to execute query "SELECT CURRENT_SETTING('server_version'), (SELECT COUNT(*) FROM pg_settings WHERE name LIKE 'rds%')": ERROR:  must be superuser to examine "server_version"

OS:

Ubuntu 22.04 LTS

PostgreSQL Version:

Redshift (based on PostgreSQL 8.0.2)

DuckDB Version:

v0.10.0 20b1486d11

DuckDB Client:

CLI

Full Name:

Cedric Yau

Affiliation:

LinebackerData

Have you tried this on the latest main branch?

  • [X] I agree

Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

  • [X] I agree

CedricYauLBD avatar Feb 14 '24 20:02 CedricYauLBD

Would LOVE to see this happen (especially if the version check is the only real blocker)

minaguib avatar Mar 01 '24 22:03 minaguib

I've modified the version check to use version() instead in https://github.com/duckdb/postgres_scanner/pull/194. I've tried to get a redshift server up and running myself but didn't manage in a reasonable time frame. If you could give that PR a shot and see if it fixes the issue that would be great.

Mytherin avatar Mar 19 '24 12:03 Mytherin

@Mytherin I downloaded and built from source, but am running into an issue when invoking ATTACH:

connection to server at "ZZZZ, port 5439 failed: FATAL:  no PostgreSQL user name specified in startup packet

This happens both in main and redshift branches, so I'm 100% sure it's before the actual version check occurs and might be something related to my local env (google checks hints at some openssl macos VS homebrew issues).

I'll figure it out then confirm the logical fix.

minaguib avatar Mar 19 '24 14:03 minaguib

I've merged it into main - you should be able to get a nightly build for v0.10.1 after CI completes (in ~1 hour maybe?) with the following command:

force install postgres_scanner from 'http://nightly-extensions.duckdb.org';

Mytherin avatar Mar 19 '24 16:03 Mytherin

@Mytherin Success! ATTACH succeeds, no errors :)

Unfortunately I can't 1. introspect the schema (via .schema) or 2. run a simple query (which internally wants to introspect schema)

": ERROR:  relation "pg_enum" does not exist

If this is something you'd like to pursue supporting, LMK if you'd like additional details here or in a new ticket.

minaguib avatar Mar 19 '24 21:03 minaguib

I've pushed a fix for that here - https://github.com/duckdb/postgres_scanner/pull/197 - that should disable pg_enum for old postgres versions (before 8.3 when support for pg_enum was added). If you could try it out that would be great.

Mytherin avatar Mar 19 '24 22:03 Mytherin

@Mytherin Close! But no cigar.

  • ATTACH works!
  • .schema introspection works!
  • Simple query fails
D select count(*) from redshift.foo.tab1;

Invalid Error: Failed to prepare COPY "
	COPY (SELECT NULL FROM "foo"."tab1" ) TO STDOUT (FORMAT binary);
	": ERROR:  syntax error at or near "("
LINE 2:  COPY (SELECT NULL FROM "foo"."tab1" ) TO ...
              ^

I feel this'll be a bit of a yak-shaving exercise. I don't think RedShift supports PG's COPY ... TO STDOUT

minaguib avatar Mar 20 '24 01:03 minaguib

Unfortunately this is harder to solve - all of the current result conversion code is written for the binary copy. In order to solve this we would need to make an alternative result conversion function that uses the (regular) text protocol and add support for that for all data types. I will leave the issue open for now as I don't have time to pick this up at this point.

Mytherin avatar Mar 20 '24 18:03 Mytherin

@Mytherin Very appreciated. Thank you. Was hoping for quick wins, but I completely understand where it's at now.

Also (thinking out loud): RedShift is often used as a medium-big analytical DB. Without reasonable predicate-pushdown or the ability to hand-craft a query to run verbatim in destination, the naive "select *" for each query/subquery will also have limitations on practical usefulness.

minaguib avatar Mar 20 '24 18:03 minaguib

That makes sense - although data transfer can always be useful even if the performance is not fantastic. We do also support the postgres_query method.

Mytherin avatar Mar 20 '24 18:03 Mytherin

@Mytherin so is it possible to use the PostgreSQL Extension and copy command with redshift? https://duckdb.org/docs/extensions/postgres#running-sql-queries-in-postgres-with-postgres_query

misteliy avatar Jun 11 '24 13:06 misteliy

@misteliy You can test it, but my hunch based on the above is that it's not yet possible to transfer real data from redshift over the wire to the duckdb instance.

minaguib avatar Jun 11 '24 13:06 minaguib