data-diff
data-diff copied to clipboard
Can't specify schema on tables for redshift
Hello, I'm trying to use the following config file to run the diff between two tables on different schemas on our Redshift cluster but I can't find the proper way to do this. I tried the table with the schema and specifying the schema on the database definition:
This returns psycopg2.ProgrammingError: invalid dsn: invalid connection option "schema"
[database.test_redshift_source]
driver = "redshift"
user = "XXXX"
password = "XXXX"
database = "XXXXX"
host = "XXXXX"
port = 5439
schema="schema_one"
[database.test_redshift_target]
driver = "redshift"
user = "XXXX"
password = "XXXX"
database = "XXXXX"
host = "XXXXX"
port = 5439
schema="schema_two"
[run.default]
update_column = "updated_at"
verbose = true
max_age = "3days"
min_age = "3days"
[run.test_diff]
1.database = "test_redshift_source"
1.table = "table"
2.database = "test_redshift_target"
2.table = "table"
This returns Table 'schema_two.table' does not exist, or has no columns
[database.test_redshift_source]
driver = "redshift"
user = "XXXX"
password = "XXXX"
database = "XXXXX"
host = "XXXXX"
port = 5439
[database.test_redshift_target]
driver = "redshift"
user = "XXXX"
password = "XXXX"
database = "XXXXX"
host = "XXXXX"
port = 5439
[run.default]
update_column = "updated_at"
verbose = true
max_age = "3days"
min_age = "3days"
[run.test_diff]
1.database = "test_redshift_source"
1.table = "schema_one.table"
2.database = "test_redshift_target"
2.table = "schema_two.table"
I tried looking at the docs, the code and the Python API reference but I couldn't figure out how to make it work :(
Can you please try again, but instead of
[database.test_redshift_source]
schema="schema_one"
Write
[database.test_redshift_source]
database="schema_one"
...
[database.test_redshift_target]
database="schema_one"
...
I know it's counter-intuitive, but I think that's the keyword that psycopg2
expects.
@erezsh I tried but I got this:
File "/Users/pedrovidotti/.pyenv/versions/3.7.5/lib/python3.7/concurrent/futures/thread.py", line 69, in _worker
initializer(*initargs)
File "/Users/pedrovidotti/.pyenv/versions/3.7.5/envs/data-diff/lib/python3.7/site-packages/data_diff/databases/base.py", line 274, in set_conn
self.thread_local.conn = self.create_connection()
File "/Users/pedrovidotti/.pyenv/versions/3.7.5/envs/data-diff/lib/python3.7/site-packages/data_diff/databases/postgresql.py", line 62, in create_connection
raise ConnectError(*e.args) from e
data_diff.databases.base.ConnectError: FATAL: database "schema_one" does not exist
The file is more or less like this:
[database.test_redshift_source]
driver = "redshift"
user = "XXXX"
password = "XXXX"
database = "schema_one"
host = "XXXXX"
port = 5439
[database.test_redshift_target]
driver = "redshift"
user = "XXXX"
password = "XXXX"
database = "schema_two"
host = "XXXXX"
port = 5439
[run.default]
update_column = "updated_at"
verbose = true
max_age = "3days"
min_age = "3days"
[run.test_diff]
1.database = "test_redshift_source"
1.table = "table"
2.database = "test_redshift_target"
2.table = "table"
@pedrosmv Sorry for the delay.
I tested this exact file with the latest data-diff version, and it connects to the database successfully. (says Table 'table' does not exist
).
Are you sure schema_one
exists in your database instance?
If so, it would really help if you could try to connect to it directly using the psycopg2
package. If that works, we'll know the problem is in data-diff.
Closed due to inactivity.