data-diff icon indicating copy to clipboard operation
data-diff copied to clipboard

Can't specify schema on tables for redshift

Open pedrosmv opened this issue 1 year ago • 3 comments

Hello, I'm trying to use the following config file to run the diff between two tables on different schemas on our Redshift cluster but I can't find the proper way to do this. I tried the table with the schema and specifying the schema on the database definition:

This returns psycopg2.ProgrammingError: invalid dsn: invalid connection option "schema"

[database.test_redshift_source]
driver = "redshift"
user = "XXXX"
password = "XXXX"
database = "XXXXX"
host = "XXXXX"
port = 5439
schema="schema_one"

[database.test_redshift_target]
driver = "redshift"
user = "XXXX"
password = "XXXX"
database = "XXXXX"
host = "XXXXX"
port = 5439
schema="schema_two"


[run.default]
update_column = "updated_at"
verbose = true
max_age = "3days"
min_age = "3days"

[run.test_diff]
1.database = "test_redshift_source"
1.table = "table"
2.database = "test_redshift_target"
2.table = "table"

This returns Table 'schema_two.table' does not exist, or has no columns

[database.test_redshift_source]
driver = "redshift"
user = "XXXX"
password = "XXXX"
database = "XXXXX"
host = "XXXXX"
port = 5439

[database.test_redshift_target]
driver = "redshift"
user = "XXXX"
password = "XXXX"
database = "XXXXX"
host = "XXXXX"
port = 5439


[run.default]
update_column = "updated_at"
verbose = true
max_age = "3days"
min_age = "3days"

[run.test_diff]
1.database = "test_redshift_source"
1.table = "schema_one.table"
2.database = "test_redshift_target"
2.table = "schema_two.table"

I tried looking at the docs, the code and the Python API reference but I couldn't figure out how to make it work :(

pedrosmv avatar Aug 02 '22 22:08 pedrosmv

Can you please try again, but instead of

[database.test_redshift_source]
schema="schema_one"

Write

[database.test_redshift_source]
database="schema_one"
...

[database.test_redshift_target]
database="schema_one"
...

I know it's counter-intuitive, but I think that's the keyword that psycopg2 expects.

erezsh avatar Aug 03 '22 07:08 erezsh

@erezsh I tried but I got this:

  File "/Users/pedrovidotti/.pyenv/versions/3.7.5/lib/python3.7/concurrent/futures/thread.py", line 69, in _worker
    initializer(*initargs)
  File "/Users/pedrovidotti/.pyenv/versions/3.7.5/envs/data-diff/lib/python3.7/site-packages/data_diff/databases/base.py", line 274, in set_conn
    self.thread_local.conn = self.create_connection()
  File "/Users/pedrovidotti/.pyenv/versions/3.7.5/envs/data-diff/lib/python3.7/site-packages/data_diff/databases/postgresql.py", line 62, in create_connection
    raise ConnectError(*e.args) from e
data_diff.databases.base.ConnectError: FATAL:  database "schema_one" does not exist

The file is more or less like this:

[database.test_redshift_source]
driver = "redshift"
user = "XXXX"
password = "XXXX"
database = "schema_one"
host = "XXXXX"
port = 5439

[database.test_redshift_target]
driver = "redshift"
user = "XXXX"
password = "XXXX"
database = "schema_two"
host = "XXXXX"
port = 5439



[run.default]
update_column = "updated_at"
verbose = true
max_age = "3days"
min_age = "3days"

[run.test_diff]
1.database = "test_redshift_source"
1.table = "table"
2.database = "test_redshift_target"
2.table = "table"

pedrosmv avatar Aug 03 '22 12:08 pedrosmv

@pedrosmv Sorry for the delay.

I tested this exact file with the latest data-diff version, and it connects to the database successfully. (says Table 'table' does not exist).

Are you sure schema_one exists in your database instance?

If so, it would really help if you could try to connect to it directly using the psycopg2 package. If that works, we'll know the problem is in data-diff.

erezsh avatar Aug 09 '22 15:08 erezsh

Closed due to inactivity.

erezsh avatar Sep 20 '22 11:09 erezsh