data-diff icon indicating copy to clipboard operation
data-diff copied to clipboard

[SQL Server] data diff fails with case sensitive schemas and objects

Open MiConnell opened this issue 1 year ago • 0 comments

Describe the bug when running data diff comparing sql server to redshift, I'm getting the error:

data_diff.databases.base.QueryError: ('42S02', "[42S02] [Microsoft][ODBC Driver 18 for SQL Server][SQL Server]Invalid object name '<database>.information_schema.columns'. (208) (SQLExecDirectW)")

The actual object name is <database>.INFORMATION_SCHEMA.COLUMNS

updating this line to uppercase solves the problem.

info_schema_path = ["INFORMATION_SCHEMA", "COLUMNS"]

Make sure to include the following (minus sensitive information): command:

data-diff -d  <redshift_uri>   <schema>.<table> <mssql_uri>   dbo.<TABLE>   -k <KEY_COLUMN> 

and

data-diff -d  <redshift_uri>   <schema>.<table> <mssql_uri>   dbo.<TABLE>   -k <KEY_COLUMN> --case-sensitive

output:

16:58:27 INFO     [Redshift] Starting a threadpool, size=1.                                                                                                       base.py:1237
         DEBUG    Running SQL (Redshift):                                                                                                                          base.py:982
                  SET TIME ZONE 'UTC'                                                                                                                                         
         INFO     [MsSQL] Starting a threadpool, size=1.                                                                                                          base.py:1237
         DEBUG    Database 'MsSQL(default_schema='dbo', _interactive=False, is_closed=False, _dialect=Dialect(_prevent_overflow_when_concat=False),            _connect.py:300
                  thread_count=1, _init_error=None, _queue=<concurrent.futures.thread.ThreadPoolExecutor object at 0x7fd5d78c0fd0>,                                           
                  thread_local=<_thread._local object at 0x7fd5d7558a90>, default_database='database', _args={<credentials>, 'driver': '{ODBC Driver 18 for SQL                  
                  Server}', 'TrustServerCertificate': 'yes'}, _mssql=None)' does not allow setting timezone. We recommend making sure it's set to 'UTC'.                      
         INFO     Using algorithm 'hashdiff'.                                                                                                                  __main__.py:394
         DEBUG    Running SQL (Redshift): ('schema', 'table')                                                                                       base.py:980
                  SELECT column_name, data_type, datetime_precision, numeric_precision, numeric_scale FROM information_schema.columns WHERE table_name =                      
                  'table' AND table_schema = 'schema'                                                                                                          
         DEBUG    Running SQL (MsSQL): ('dbo', 'TABLE')                                                                                                base.py:980
                  SELECT column_name, data_type, datetime_precision, numeric_precision, numeric_scale, collation_name FROM                                                    
                  [database].information_schema.columns WHERE table_name = 'TABLE' AND table_schema = 'dbo'                                                
         ERROR    ('42S02', "[42S02] [Microsoft][ODBC Driver 18 for SQL Server][SQL Server]Invalid object name 'database.information_schema.columns'.   __main__.py:344
                  (208) (SQLExecDirectW)")                                                                                                                                    
Traceback (most recent call last):
  File "/home/ubuntu/ingestion/venv/lib/python3.10/site-packages/data_diff/databases/mssql.py", line 227, in _query_cursor
    return super()._query_cursor(c, sql_code)
  File "/home/ubuntu/ingestion/venv/lib/python3.10/site-packages/data_diff/databases/base.py", line 1173, in _query_cursor
    c.execute(sql_code)
pyodbc.ProgrammingError: ('42S02', "[42S02] [Microsoft][ODBC Driver 18 for SQL Server][SQL Server]Invalid object name 'database.information_schema.columns'. (208) (SQLExecDirectW)")

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ubuntu/ingestion/venv/bin/data-diff", line 8, in <module>
    sys.exit(main())
  File "/home/ubuntu/ingestion/venv/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/home/ubuntu/ingestion/venv/lib/python3.10/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/home/ubuntu/ingestion/venv/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/ubuntu/ingestion/venv/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/home/ubuntu/ingestion/venv/lib/python3.10/site-packages/data_diff/__main__.py", line 342, in main
    _data_diff(dbt_project_dir=project_dir_override, dbt_profiles_dir=profiles_dir_override, state=state, **kw)
  File "/home/ubuntu/ingestion/venv/lib/python3.10/site-packages/data_diff/__main__.py", line 571, in _data_diff
    schemas = list(differ._thread_map(_get_schema, safezip(dbs, table_paths)))
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 621, in result_iterator
    yield _result_or_cancel(fs.pop())
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 319, in _result_or_cancel
    return fut.result(timeout)
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 451, in result
    return self.__get_result()
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/home/ubuntu/ingestion/venv/lib/python3.10/site-packages/data_diff/__main__.py", line 75, in _get_schema
    return db.query_table_schema(table_path)
  File "/home/ubuntu/ingestion/venv/lib/python3.10/site-packages/data_diff/databases/base.py", line 1048, in query_table_schema
    rows = self.query(self.select_table_schema(path), list, log_message=path)
  File "/home/ubuntu/ingestion/venv/lib/python3.10/site-packages/data_diff/databases/base.py", line 996, in query
    res = self._query(sql_code)
  File "/home/ubuntu/ingestion/venv/lib/python3.10/site-packages/data_diff/databases/base.py", line 1248, in _query
    return r.result()
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 458, in result
    return self.__get_result()
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/home/ubuntu/ingestion/venv/lib/python3.10/site-packages/data_diff/databases/base.py", line 1254, in _query_in_worker
    return self._query_conn(self.thread_local.conn, sql_code)
  File "/home/ubuntu/ingestion/venv/lib/python3.10/site-packages/data_diff/databases/base.py", line 1188, in _query_conn
    return apply_query(callback, sql_code)
  File "/home/ubuntu/ingestion/venv/lib/python3.10/site-packages/data_diff/databases/base.py", line 211, in apply_query
    return callback(sql_code)
  File "/home/ubuntu/ingestion/venv/lib/python3.10/site-packages/data_diff/databases/mssql.py", line 229, in _query_cursor
    raise QueryError(e)
data_diff.databases.base.QueryError: ('42S02', "[42S02] [Microsoft][ODBC Driver 18 for SQL Server][SQL Server]Invalid object name 'database.information_schema.columns'. (208) (SQLExecDirectW)")

If possible, please paste these as text, and not a screenshot.

Describe the environment

Ubuntu@data_diff v0.11.1

MiConnell avatar Apr 16 '24 17:04 MiConnell