great_expectations
great_expectations copied to clipboard
Great Expectations not working as expected with SQLAlchemy 2.0 against MS SQL database
Describe the bug
- Unable to run expectations against locally running SQL Server in Docker using SQLAlchemy 2.0.7 and great-expectations 0.17.12
- Downgrading to SQLALchemy 1.4 works but this is not a satisfactory result because, as far as I can tell, SQLAlchemy 2.0 is now supported by GX. Here is an old issue related to this but at the time the workaround was to downgrade since SQLAlchemy 2.0 wasn't officially supported. As of 0.17.0, SQLAlchemy 2.0 is officially supported. Here is another related issue.
Attempting to run checks on local SQL Server results in:
Traceback (most recent call last):\n File "<env>\\lib\\site-packages\\sqlalchemy\\engine\\base.py", line 1410, in execute\n meth = statement._execute_on_connection\nAttributeError: \'str\' object has no attribute \'_execute_on_connection\'
To Reproduce
- Install sqlalchemy==2.0.7
- Install great_expectations==0.17.12
- Install great_expectations[mssql], great_expectations[sqlalchemy]
- Setup local docker instance of MS SQL with user called 'sa' and with a table called 'my_table'
- Run code below and get exception below
- Downgrade to latest 1.4 version of SQLAlchemy and the same code works as expected.
Please include your great_expectations.yml config, the code youâre executing that causes the issue, and the full stack trace of any error(s). great_expectations.yml
# Welcome to Great Expectations! Always know what to expect from your data.
#
# Here you can define datasources, batch kwargs generators, integrations and
# more. This file is intended to be committed to your repo. For help with
# configuration please:
# - Read our docs: https://docs.greatexpectations.io/docs/guides/connecting_to_your_data/connect_to_data_overview/#2-configure-your-datasource
# - Join our slack channel: http://greatexpectations.io/slack
# config_version refers to the syntactic version of this config file, and is used in maintaining backwards compatibility
# It is auto-generated and usually does not need to be changed.
config_version: 3.0
# Datasources tell Great Expectations where your data lives and how to get it.
# You can use the CLI command `great_expectations datasource new` to help you
# add a new datasource. Read more at https://docs.greatexpectations.io/docs/guides/connecting_to_your_data/connect_to_data_overview
datasources: {}
# This config file supports variable substitution which enables: 1) keeping
# secrets out of source control & 2) environment-based configuration changes
# such as staging vs prod.
#
# When GX encounters substitution syntax (like `my_key: ${my_value}` or
# `my_key: $my_value`) in the great_expectations.yml file, it will attempt
# to replace the value of `my_key` with the value from an environment
# variable `my_value` or a corresponding key read from this config file,
# which is defined through the `config_variables_file_path`.
# Environment variables take precedence over variables defined here.
#
# Substitution values defined here can be a simple (non-nested) value,
# nested value such as a dictionary, or an environment variable (i.e. ${ENV_VAR})
#
#
# https://docs.greatexpectations.io/docs/guides/setup/configuring_data_contexts/how_to_configure_credentials
config_variables_file_path: uncommitted/config_variables.yml
# The plugins_directory will be added to your python path for custom modules
# used to override and extend Great Expectations.
plugins_directory: plugins/
stores:
# Stores are configurable places to store things like Expectations, Validations
# Data Docs, and more. These are for advanced users only - most users can simply
# leave this section alone.
#
# Three stores are required: expectations, validations, and
# evaluation_parameters, and must exist with a valid store entry. Additional
# stores can be configured for uses such as data_docs, etc.
expectations_store:
class_name: ExpectationsStore
store_backend:
class_name: TupleFilesystemStoreBackend
base_directory: expectations/
validations_store:
class_name: ValidationsStore
store_backend:
class_name: TupleFilesystemStoreBackend
base_directory: uncommitted/validations/
evaluation_parameter_store:
class_name: EvaluationParameterStore
checkpoint_store:
class_name: CheckpointStore
store_backend:
class_name: TupleFilesystemStoreBackend
suppress_store_backend_id: true
base_directory: checkpoints/
profiler_store:
class_name: ProfilerStore
store_backend:
class_name: TupleFilesystemStoreBackend
suppress_store_backend_id: true
base_directory: profilers/
expectations_store_name: expectations_store
validations_store_name: validations_store
evaluation_parameter_store_name: evaluation_parameter_store
checkpoint_store_name: checkpoint_store
data_docs_sites:
# Data Docs make it simple to visualize data quality in your project. These
# include Expectations, Validations & Profiles. The are built for all
# Datasources from JSON artifacts in the local repo including validations &
# profiles from the uncommitted directory. Read more at https://docs.greatexpectations.io/docs/terms/data_docs
local_site:
class_name: SiteBuilder
show_how_to_buttons: true
store_backend:
class_name: TupleFilesystemStoreBackend
base_directory: uncommitted/data_docs/local_site/
site_index_builder:
class_name: DefaultSiteIndexBuilder
anonymous_usage_statistics:
data_context_id: 4696bd72-6135-4c14-ac22-cbeb6ef6d916
enabled: true
fluent_datasources:
mds_local_docker_db:
type: sql
assets:
my_table_asset:
type: table
order_by: []
batch_metadata: {}
table_name: pmds_metrics
schema_name:
connection_string: mssql+pyodbc://sa:<password>@localhost:1433/mds?driver=ODBC
Driver 17 for SQL Server&charset=utf&autocommit=true
notebooks:
include_rendered_content:
globally: false
expectation_suite: false
expectation_validation_result: false
Full Stacktrace:
Exceptions
{('table.head', 'batch_id=mds_local_docker_db-my_table_asset', '04166707abe073177c1dd922d3584468'): {'metric_configuration': {
"metric_name": "table.head",
"metric_domain_kwargs": {
"batch_id": "mds_local_docker_db-my_table_asset"
},
"metric_domain_kwargs_id": "batch_id=mds_local_docker_db-my_table_asset",
"metric_value_kwargs": {
"n_rows": 5,
"fetch_all": false
},
"metric_value_kwargs_id": "04166707abe073177c1dd922d3584468",
"id": [
"table.head",
"batch_id=mds_local_docker_db-my_table_asset",
"04166707abe073177c1dd922d3584468"
]
}, 'num_failures': 3, 'exception_info': {{'exception_traceback': 'Traceback (most recent call last):\n File "<env>\\lib\\site-packages\\sqlalchemy\\engine\\base.py", line 1410, in execute\n meth = statement._execute_on_connection\nAttributeError: \'str\' object has no attribute \'_execute_on_connection\'\n\nThe above exception was the direct cause of the following exception:\n\nTraceback (most recent call last):\n File "<env>\\lib\\site-packages\\great_expectations\\execution_engine\\execution_engine.py", line 548, in _process_direct_and_bundled_metric_computation_configurations\n ] = metric_computation_configuration.metric_fn( # type: ignore[misc] # F not callable\n File "<env>\\lib\\site-packages\\great_expectations\\expectations\\metrics\\metric_provider.py", line 50, in inner_func\n return metric_fn(*args, **kwargs)\n File "<cenv>\\lib\\site-packages\\great_expectations\\expectations\\metrics\\table_metrics\\table_head.py", line 109, in _sqlalchemy\n df_chunk_iterator = read_sql_table_as_df(\n File "<env>\\lib\\site-packages\\great_expectations\\compatibility\\sqlalchemy_compatibility_wrappers.py", line 55, in read_sql_table_as_df\n return _read_sql_table_as_df(\n File "<env>\\lib\\site-packages\\great_expectations\\compatibility\\sqlalchemy_compatibility_wrappers.py", line 133, in _read_sql_table_as_df\n return pd.read_sql_query(\n File "<env>\\lib\\site-packages\\pandas\\io\\sql.py", line 397, in read_sql_query\n return pandas_sql.read_query(\n File "<env>\\lib\\site-packages\\pandas\\io\\sql.py", line 1560, in read_query\n result = self.execute(*args)\n File "<env>\\lib\\site-packages\\pandas\\io\\sql.py", line 1405, in execute\n return self.connectable.execution_options().execute(*args, **kwargs)\n File "<env>\\lib\\site-packages\\sqlalchemy\\engine\\base.py", line 1412, in execute\n raise exc.ObjectNotExecutableError(statement) from err\nsqlalchemy.exc.ObjectNotExecutableError: Not an executable object: \'SELECT * FROM #gx_temp_427a0638\'\n\nThe above exception was the direct cause of the following exception:\n\nTraceback (most recent call last):\n File "<env>\\lib\\site-packages\\great_expectations\\validator\\validation_graph.py", line 285, in _resolve\n self._execution_engine.resolve_metrics(\n File "<env>\\lib\\site-packages\\great_expectations\\execution_engine\\execution_engine.py", line 283, in resolve_metrics\n return self._process_direct_and_bundled_metric_computation_configurations(\n
File "<env>\\lib\\site-packages\\great_expectations\\execution_engine\\execution_engine.py", line 552, in _process_direct_and_bundled_metric_computation_configurations\n raise gx_exceptions.MetricResolutionError(\ngreat_expectations.exceptions.exceptions.MetricResolutionError: Not an executable object: \'SELECT * FROM #gx_temp_427a0638\'\n', 'exception_message': "Not an executable object: 'SELECT * FROM #gx_temp_427a0638'", 'raised_exception': True}}}}
occurred while resolving metrics.
Traceback (most recent call last):
File "ge_test.py", line 90, in <module>
main(password)
File "ge_test.py", line 36, in main
print(validator.head())
File "<env>\lib\site-packages\great_expectations\validator\validator.py", line 411, in head
return self._metrics_calculator.head(
File "<env>\lib\site-packages\great_expectations\validator\metrics_calculator.py", line 104, in head
df: pd.DataFrame = self.get_metric(
File "<env>\lib\site-packages\great_expectations\validator\metrics_calculator.py", line 122, in get_metric
return self.get_metrics(
File "<env>\lib\site-packages\great_expectations\validator\metrics_calculator.py", line 142, in get_metrics
return {
File "<env>\lib\site-packages\great_expectations\validator\metrics_calculator.py", line 143, in <dictcomp>
metric_configuration.metric_name: resolved_metrics[metric_configuration.id]
KeyError: ('table.head', 'batch_id=mds_local_docker_db-my_table_asset', '04166707abe073177c1dd922d3584468')
code
def main(password: str):
context = gx.get_context()
print(f"Data sources: {context.list_datasources()}")
mssql_datasource = context.sources.add_or_update_sql(
name="mds_local_docker_db",
connection_string=f"mssql+pyodbc://sa:{password}@localhost:1433/mds?driver=ODBC Driver 17 for SQL Server&charset=utf&autocommit=true",
)
print("DataSource created")
mssql_datasource.add_table_asset(name="my_table_asset", table_name="my_table")
print("TableAsset created")
batch_request = mssql_datasource.get_asset("my_table_asset").build_batch_request()
print("BatchRequest created")
expectation_suite_name = "MDS_MY_TABLE_EXPECTATIONS"
context.add_or_update_expectation_suite(
expectation_suite_name=expectation_suite_name
)
assert "MDS_MY_TABLE_EXPECTATIONS" in context.list_expectation_suite_names()
print("ExpectationSuite created")
validator = context.get_validator(
batch_request=batch_request,
expectation_suite_name=expectation_suite_name,
)
print("Validator created")
print(validator.head())
# validator.expect_column_values_to_match_strftime_format('date', '%Y-%m-%d %H:%M:%S.')
validator.expect_column_values_to_not_be_null(column="date")
validator.expect_column_distinct_values_to_be_in_set(
"business_line",
set(
[
"BMC",
"CB",
"CDE",
"NVT",
"PBBE",
"PBDE",
"PBFR",
"PBNL",
"RT",
]
),
)
validator.save_expectation_suite(discard_failed_expectations=False)
my_checkpoint_name = "my_sql_checkpoint"
checkpoint = Checkpoint(
name=my_checkpoint_name,
run_name_template="%Y%m%d-%H%M%S-my-run-name-template",
data_context=context,
batch_request=batch_request,
expectation_suite_name=expectation_suite_name,
action_list=[
{
"name": "store_validation_result",
"action": {"class_name": "StoreValidationResultAction"},
},
{
"name": "update_data_docs",
"action": {"class_name": "UpdateDataDocsAction"},
},
],
)
context.add_or_update_checkpoint(checkpoint=checkpoint)
checkpoint_result = checkpoint.run()
context.open_data_docs()
Expected behavior I expect Great Expectations to successfully connect to my locally running docker image with SQL Server using the installed SQLAlchemy 2.0.7 and run the quality checks
Environment (please complete the following information):
- Windows
- Great Expectations Version:0.17.12
- Data Source: Dockerized SQL Server instance running locally
- Cloud environment: None
Additional context Add any other context about the problem here.
Hey @robinstauntoncollins, thanks for raising this issue. We've captured this for review.
We are using GX with AWS Athena, and facing this issue, too
I'm facing the same issue using GX version 0.18.13 with AWS Athena and SQLAlchemy 2.0.30