great_expectations icon indicating copy to clipboard operation
great_expectations copied to clipboard

expect_column_values_to_be_unique not respecting create_temp_table equals false

Open kantegajorgen opened this issue 1 year ago • 1 comments

Describe the bug When using expect_column_values_to_be_unique, the create_temp_table=false is not respected.

To Reproduce Steps to reproduce the behavior:

  1. Set up datasource with Synapse linked to CosmosDB
  2. Create view in Synapse querying data in CosmosDB Analytics Store
  3. Set create_temp_table=false in GE
  4. Create expectation with expect_column_values_to_be_unique
  5. Create and run checkpoint using this expectation.
  6. Build Data Docs
  7. In Data Docs, locate SQL-error in Validation Result for this expectation:
  • (pyodbc.ProgrammingError) ('42000', '[42000] [Microsoft][ODBC Driver 18 for SQL Server][SQL Server]The query references an object that is not supported in distributed processing mode. (15816) (SQLExecDirectW)') [SQL: INSERT INTO [#ge_temp_96c2c045] (condition) SELECT CASE WHEN ([licenseId] IS NOT NULL AND [licenseId] IN (SELECT [licenseId] FROM (SELECT * FROM bronze.licenses WHERE 1 = 1) AS great_expectations_sub_selection GROUP BY [licenseId] HAVING count([licenseId]) > ?)) THEN CAST(? AS NUMERIC) ELSE CAST(? AS NUMERIC) END AS condition FROM (SELECT * FROM bronze.licenses WHERE 1 = 1) AS great_expectations_sub_selection] [parameters: (1, 1, 0)] (Background on this error at: https://sqlalche.me/e/14/f405)

Expected behavior No SQL-error. Expected expectation not to use temp table.

Environment (please complete the following information):

  • Operating System: MacOS
  • Great Expectations Version: 0.15.23

Additional context Using eg. expect_column_values_to_be_in_set works fine with create_temp_table=false. With default create_temp_table=true, expect_column_values_to_be_in_set also fails because it uses temp table.

kantegajorgen avatar Sep 22 '22 07:09 kantegajorgen

Hi @kantegajorgen - thanks for raising this. I think the issue here is that we don't offer full support for Synapse or CosmosDB - just whatever is immediately accessible through a SqlAlchemy connection string. When that is the case, temp table functionality (or the lack thereof) is one of the first errors that users tend to hit. If you are interested in making a contribution here, please let us know!

talagluck avatar Sep 22 '22 18:09 talagluck