great_expectations
great_expectations copied to clipboard
Azure Synapse issue with table level expectations
Hi,
I am trying to use Great Expectations on Azure Synapse Workspace with data source as Azure synapse dedicated SQL pool. I am running into issues when trying to run an expectation suite. The error is as follows:
ge_exceptions.MetricResolutionError(great_expectations.exceptions.exceptions.MetricResolutionError: 'NoneType' object is not iterable)
Note: I am getting this error only for column level expectations. Table level expectations are working fine.
Expected behavior Ideally, both table and column level expectations should run without errors.
Environment:
- Azure Synapse Analytics
- Great Expectations Version: 0.15.25
Additional context I tried running the same thing with data source as Azure SQL, it worked fine. The issue is coming only for dedicated Synapse SQL data.
Howdy @dollyBa :wave: thanks for raising this with us and being a part of this lovely community :bow:
Do you happen to have a full stack trace and can you provide the workflow/configuration that we can reference further? :microscope:
Hey,
Please find below workflow/configuration for further reference:
Below is the configuration used to set up the BaseDataContext:
data_context_config = DataContextConfig(
config_version=2,
plugins_directory=None,
config_variables_file_path=None,
datasources={
"my_spark_datasource_config": DatasourceConfig(
class_name="Datasource",
execution_engine={
"class_name": "SqlAlchemyExecutionEngine",
"module_name": "great_expectations.execution_engine",
"connection_string": "mssql+pyodbc://<user_name>:<password>@<server_name>/<database_name>?driver=ODBC Driver 17 for SQL Server&charset=utf&autocommit=true"
},
data_connectors={
"default_runtime_data_connector_name": {
"class_name": "RuntimeDataConnector",
"module_name": "great_expectations.datasource.data_connector",
"batch_identifiers": "default_identifier_name"
},
"default_inferred_data_connector_name": {
"class_name": "InferredAssetSqlDataConnector",
"module_name": "great_expectations.datasource.data_connector",
"introspection_directives": {"schema_name": "<schema_name>"},
"include_schema_name": "true"
},
"default_configured_data_connector_name": {
"class_name": "ConfiguredAssetSqlDataConnector",
"module_name": "great_expectations.datasource.data_connector",
"assets": {"<table_name>":{"class_name": "Asset", "module_name": "great_expectations.datasource.data_connector.asset","schema_name": "<schema_name>"}},
"force_reuse_spark_context": "True",
}
}
)
},
stores={
"expectations_SQL_store": {
"class_name": "ExpectationsStore",
"store_backend": {
"class_name": "DatabaseStoreBackend",
"connection_string": f"mssql+pyodbc://<db_user>:<db_pass>@<db_servername>/<db_name>?driver=ODBC Driver 17 for SQL Server"
},
},
"validations_SQL_store": {
"class_name": "ValidationsStore",
"store_backend": {
"class_name": "DatabaseStoreBackend",
"connection_string": f"mssql+pyodbc://<db_user>:<db_pass>@<db_servername>/<db_name>?driver=ODBC Driver 17 for SQL Server"
},
},
"evaluation_parameter_store": {"class_name": "EvaluationParameterStore"},
},
expectations_store_name="expectations_SQL_store",
validations_store_name="validations_SQL_store",
evaluation_parameter_store_name="evaluation_parameter_store",
validation_operators={
"action_list_operator": {
"class_name": "ActionListValidationOperator",
"action_list": [
{
"name": "store_validation_result",
"action": {"class_name": "StoreValidationResultAction"},
},
{
"name": "store_evaluation_params",
"action": {"class_name": "StoreEvaluationParametersAction"},
},
{
"name": "update_data_docs",
"action": {"class_name": "UpdateDataDocsAction"},
},
],
}
},
anonymous_usage_statistics={
"enabled": True
}
)
context = BaseDataContext(project_config=data_context_config)
and following is the skeleton of how i have created the validator object:
suite = context.create_expectation_suite(expectation_suite_name=expectation_suite_name)
#Set the data source
batch_request=RuntimeBatchRequest(
datasource_name="my_spark_datasource_config",
data_connector_name="default_runtime_data_connector_name",
data_asset_name="my_data_asset_name",
runtime_parameters={
"query": <query>
},
batch_identifiers={"d": "batch_run_id"},
batch_spec_passthrough= {
"create_temp_table": False
}
)
validator = context.get_validator(
batch_request=batch_request,
expectation_suite_name=expectation_suite_name
)
Hey @AFineDayFor, do let me know if you need any further details from my end.