great_expectations
great_expectations copied to clipboard
return_unexpected_index_query returning broken query, escaping double quotes in result for SparkDFExecutionEngine
Describe the bug we have set result_format as "COMPLETE" and return_unexpected_index_query to true, we want to use return_unexpected_index_query to get the error records from dataframe. It seems that it is returning broken query and escaping double quotes. Example : return unexpected index query by GX : df.filter(F.expr(NOT(city IS NOT NULL))) working query : df.filter(F.expr("NOT(city IS NOT NULL)"))
To Reproduce import great_expectations as ge from great_expectations.core import ExpectationSuite from great_expectations.core.batch import RuntimeBatchRequest from great_expectations.data_context import BaseDataContext from great_expectations.data_context.types.base import FilesystemStoreBackendDefaults,DataContextConfig, DatasourceConfig
expectation_suite_config = { "expectation_suite_name": "my_expectation_suite", "expectations": [ # List of expectations { "expectation_type": "expect_column_values_to_not_be_null", "kwargs": { "column": "my_column", "result_format": {"result_format": "COMPLETE"}
}
}
]
}
my_expectation_suite = ExpectationSuite(**expectation_suite_config)
Define DataContext configuration
data_context_config = DataContextConfig( plugins_directory=None, config_variables_file_path= None, datasources={ "my_spark_datasource": DatasourceConfig( class_name= "Datasource", execution_engine={ "class_name": "SparkDFExecutionEngine", "force_reuse_spark_context": True,
},
data_connectors={
"spark_runtime_dataconnector":{
"class_name": "RuntimeDataConnector",
"module_name":"great_expectations.datasource.data_connector",
"batch_identifiers": ["batch_name"]
},
},
)
},
store_backend_defaults=FilesystemStoreBackendDefaults(root_directory="/"),
)
batch_request=RuntimeBatchRequest(datasource_name="my_spark_datasource", data_connector_name="spark_runtime_dataconnector", data_asset_name="my_asset", runtime_parameters={"batch_data": df}, batch_identifiers={"batch_name": "batch_run"})
context = ge.get_context(project_config=data_context_config) batch_validator = context.get_validator(batch_request=batch_request, expectation_suite=my_expectation_suite) validation_result = batch_validator.validate() print(validation_result)
validation_result contains unexpected_index_query value as "df.filter(F.expr(NOT(city IS NOT NULL)))"
when i execute this query it is giving error syntax error. Invalid syntax
Expected behavior Executing Query should result into getting error records from dataframe
Environment (please complete the following information):
- Operating System: [e.g. Linux, MacOS, Windows] --> Windows
- Great Expectations Version: [e.g. 0.13.2] --> 0.18.13
- Data Source: [e.g. Pandas, Snowflake] --> spark dataframe
- Cloud environment: [e.g. Airflow, AWS, Azure, Databricks, GCP] --> Databricks
Additional context Add any other context about the problem here.