postgres_scanner
postgres_scanner copied to clipboard
Please add support for AWS Redshift (based on PostgreSQL 8.0.2)
What happens?
When I try to use the postgres extension with AWS Redshift I get an error message:
Error: Invalid Error: Failed to execute query "SELECT CURRENT_SETTING('server_version'), (SELECT COUNT(*) FROM pg_settings WHERE name LIKE 'rds%')": ERROR: must be superuser to examine "server_version"
Based on https://stackoverflow.com/questions/51422236/what-role-permission-needed-for-the-user-to-get-the-server-version-in-amazon-red
I can use SELECT version(); on Redshift.
cyau@redshift:dev> SELECT version();
+---------------------------------------------------------------------------------------------------------------------------+
| version |
|---------------------------------------------------------------------------------------------------------------------------|
| PostgreSQL 8.0.2 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.4.2 20041017 (Red Hat 3.4.2-6.fc3), Redshift 1.0.62878 |
+---------------------------------------------------------------------------------------------------------------------------+
SELECT 1
Time: 0.043s
This appears to be caused by the query here: https://github.com/duckdb/postgres_scanner/blob/883a8f1a8a487264855a5166f7df1f46ad386434/src/postgres_connection.cpp#L122
To Reproduce
v0.10.0 20b1486d11
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
D attach 'dbname=mydb user=username host=redshift.server.address password=password port=5439' as db (TYPE postgres);
Error: Invalid Error: Failed to execute query "SELECT CURRENT_SETTING('server_version'), (SELECT COUNT(*) FROM pg_settings WHERE name LIKE 'rds%')": ERROR: must be superuser to examine "server_version"
OS:
Ubuntu 22.04 LTS
PostgreSQL Version:
Redshift (based on PostgreSQL 8.0.2)
DuckDB Version:
v0.10.0 20b1486d11
DuckDB Client:
CLI
Full Name:
Cedric Yau
Affiliation:
LinebackerData
Have you tried this on the latest main branch?
- [X] I agree
Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?
- [X] I agree
Would LOVE to see this happen (especially if the version check is the only real blocker)
I've modified the version check to use version() instead in https://github.com/duckdb/postgres_scanner/pull/194. I've tried to get a redshift server up and running myself but didn't manage in a reasonable time frame. If you could give that PR a shot and see if it fixes the issue that would be great.
@Mytherin I downloaded and built from source, but am running into an issue when invoking ATTACH:
connection to server at "ZZZZ, port 5439 failed: FATAL: no PostgreSQL user name specified in startup packet
This happens both in main and redshift branches, so I'm 100% sure it's before the actual version check occurs and might be something related to my local env (google checks hints at some openssl macos VS homebrew issues).
I'll figure it out then confirm the logical fix.
I've merged it into main - you should be able to get a nightly build for v0.10.1 after CI completes (in ~1 hour maybe?) with the following command:
force install postgres_scanner from 'http://nightly-extensions.duckdb.org';
@Mytherin Success! ATTACH succeeds, no errors :)
Unfortunately I can't 1. introspect the schema (via .schema) or 2. run a simple query (which internally wants to introspect schema)
": ERROR: relation "pg_enum" does not exist
If this is something you'd like to pursue supporting, LMK if you'd like additional details here or in a new ticket.
I've pushed a fix for that here - https://github.com/duckdb/postgres_scanner/pull/197 - that should disable pg_enum for old postgres versions (before 8.3 when support for pg_enum was added). If you could try it out that would be great.
@Mytherin Close! But no cigar.
- ATTACH works!
.schemaintrospection works!- Simple query fails
D select count(*) from redshift.foo.tab1;
Invalid Error: Failed to prepare COPY "
COPY (SELECT NULL FROM "foo"."tab1" ) TO STDOUT (FORMAT binary);
": ERROR: syntax error at or near "("
LINE 2: COPY (SELECT NULL FROM "foo"."tab1" ) TO ...
^
I feel this'll be a bit of a yak-shaving exercise. I don't think RedShift supports PG's COPY ... TO STDOUT
Unfortunately this is harder to solve - all of the current result conversion code is written for the binary copy. In order to solve this we would need to make an alternative result conversion function that uses the (regular) text protocol and add support for that for all data types. I will leave the issue open for now as I don't have time to pick this up at this point.
@Mytherin Very appreciated. Thank you. Was hoping for quick wins, but I completely understand where it's at now.
Also (thinking out loud): RedShift is often used as a medium-big analytical DB. Without reasonable predicate-pushdown or the ability to hand-craft a query to run verbatim in destination, the naive "select *" for each query/subquery will also have limitations on practical usefulness.
That makes sense - although data transfer can always be useful even if the performance is not fantastic. We do also support the postgres_query method.
@Mytherin so is it possible to use the PostgreSQL Extension and copy command with redshift? https://duckdb.org/docs/extensions/postgres#running-sql-queries-in-postgres-with-postgres_query
@misteliy You can test it, but my hunch based on the above is that it's not yet possible to transfer real data from redshift over the wire to the duckdb instance.