data-diff
data-diff copied to clipboard
[SQL Server] data diff fails with case sensitive schemas and objects
Describe the bug when running data diff comparing sql server to redshift, I'm getting the error:
data_diff.databases.base.QueryError: ('42S02', "[42S02] [Microsoft][ODBC Driver 18 for SQL Server][SQL Server]Invalid object name '<database>.information_schema.columns'. (208) (SQLExecDirectW)")
The actual object name is <database>.INFORMATION_SCHEMA.COLUMNS
updating this line to uppercase solves the problem.
info_schema_path = ["INFORMATION_SCHEMA", "COLUMNS"]
Make sure to include the following (minus sensitive information): command:
data-diff -d <redshift_uri> <schema>.<table> <mssql_uri> dbo.<TABLE> -k <KEY_COLUMN>
and
data-diff -d <redshift_uri> <schema>.<table> <mssql_uri> dbo.<TABLE> -k <KEY_COLUMN> --case-sensitive
output:
16:58:27 INFO [Redshift] Starting a threadpool, size=1. base.py:1237
DEBUG Running SQL (Redshift): base.py:982
SET TIME ZONE 'UTC'
INFO [MsSQL] Starting a threadpool, size=1. base.py:1237
DEBUG Database 'MsSQL(default_schema='dbo', _interactive=False, is_closed=False, _dialect=Dialect(_prevent_overflow_when_concat=False), _connect.py:300
thread_count=1, _init_error=None, _queue=<concurrent.futures.thread.ThreadPoolExecutor object at 0x7fd5d78c0fd0>,
thread_local=<_thread._local object at 0x7fd5d7558a90>, default_database='database', _args={<credentials>, 'driver': '{ODBC Driver 18 for SQL
Server}', 'TrustServerCertificate': 'yes'}, _mssql=None)' does not allow setting timezone. We recommend making sure it's set to 'UTC'.
INFO Using algorithm 'hashdiff'. __main__.py:394
DEBUG Running SQL (Redshift): ('schema', 'table') base.py:980
SELECT column_name, data_type, datetime_precision, numeric_precision, numeric_scale FROM information_schema.columns WHERE table_name =
'table' AND table_schema = 'schema'
DEBUG Running SQL (MsSQL): ('dbo', 'TABLE') base.py:980
SELECT column_name, data_type, datetime_precision, numeric_precision, numeric_scale, collation_name FROM
[database].information_schema.columns WHERE table_name = 'TABLE' AND table_schema = 'dbo'
ERROR ('42S02', "[42S02] [Microsoft][ODBC Driver 18 for SQL Server][SQL Server]Invalid object name 'database.information_schema.columns'. __main__.py:344
(208) (SQLExecDirectW)")
Traceback (most recent call last):
File "/home/ubuntu/ingestion/venv/lib/python3.10/site-packages/data_diff/databases/mssql.py", line 227, in _query_cursor
return super()._query_cursor(c, sql_code)
File "/home/ubuntu/ingestion/venv/lib/python3.10/site-packages/data_diff/databases/base.py", line 1173, in _query_cursor
c.execute(sql_code)
pyodbc.ProgrammingError: ('42S02', "[42S02] [Microsoft][ODBC Driver 18 for SQL Server][SQL Server]Invalid object name 'database.information_schema.columns'. (208) (SQLExecDirectW)")
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/ubuntu/ingestion/venv/bin/data-diff", line 8, in <module>
sys.exit(main())
File "/home/ubuntu/ingestion/venv/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
return self.main(*args, **kwargs)
File "/home/ubuntu/ingestion/venv/lib/python3.10/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
File "/home/ubuntu/ingestion/venv/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/ubuntu/ingestion/venv/lib/python3.10/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/home/ubuntu/ingestion/venv/lib/python3.10/site-packages/data_diff/__main__.py", line 342, in main
_data_diff(dbt_project_dir=project_dir_override, dbt_profiles_dir=profiles_dir_override, state=state, **kw)
File "/home/ubuntu/ingestion/venv/lib/python3.10/site-packages/data_diff/__main__.py", line 571, in _data_diff
schemas = list(differ._thread_map(_get_schema, safezip(dbs, table_paths)))
File "/usr/lib/python3.10/concurrent/futures/_base.py", line 621, in result_iterator
yield _result_or_cancel(fs.pop())
File "/usr/lib/python3.10/concurrent/futures/_base.py", line 319, in _result_or_cancel
return fut.result(timeout)
File "/usr/lib/python3.10/concurrent/futures/_base.py", line 451, in result
return self.__get_result()
File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
raise self._exception
File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
File "/home/ubuntu/ingestion/venv/lib/python3.10/site-packages/data_diff/__main__.py", line 75, in _get_schema
return db.query_table_schema(table_path)
File "/home/ubuntu/ingestion/venv/lib/python3.10/site-packages/data_diff/databases/base.py", line 1048, in query_table_schema
rows = self.query(self.select_table_schema(path), list, log_message=path)
File "/home/ubuntu/ingestion/venv/lib/python3.10/site-packages/data_diff/databases/base.py", line 996, in query
res = self._query(sql_code)
File "/home/ubuntu/ingestion/venv/lib/python3.10/site-packages/data_diff/databases/base.py", line 1248, in _query
return r.result()
File "/usr/lib/python3.10/concurrent/futures/_base.py", line 458, in result
return self.__get_result()
File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
raise self._exception
File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
File "/home/ubuntu/ingestion/venv/lib/python3.10/site-packages/data_diff/databases/base.py", line 1254, in _query_in_worker
return self._query_conn(self.thread_local.conn, sql_code)
File "/home/ubuntu/ingestion/venv/lib/python3.10/site-packages/data_diff/databases/base.py", line 1188, in _query_conn
return apply_query(callback, sql_code)
File "/home/ubuntu/ingestion/venv/lib/python3.10/site-packages/data_diff/databases/base.py", line 211, in apply_query
return callback(sql_code)
File "/home/ubuntu/ingestion/venv/lib/python3.10/site-packages/data_diff/databases/mssql.py", line 229, in _query_cursor
raise QueryError(e)
data_diff.databases.base.QueryError: ('42S02', "[42S02] [Microsoft][ODBC Driver 18 for SQL Server][SQL Server]Invalid object name 'database.information_schema.columns'. (208) (SQLExecDirectW)")
If possible, please paste these as text, and not a screenshot.
Describe the environment
Ubuntu@data_diff v0.11.1