great_expectations
great_expectations copied to clipboard
UserConfigurableProfiler fails when RuntimeBatchRequest uses query and "create_temp_table": False
Describe the bug When using the UserConfigurableProfiler and a RuntimeBatchRequest where "runtime_parameters" uses a query like {'schema': 'sample_data', 'query': 'select * from sample_data.orders'} and "batch_spec_passthrough" = {"create_temp_table": False} the following exception is thrown:
File "/Users/myuser/code/ge/venv/lib/python3.9/site-packages/great_expectations/expectations/core/expect_column_values_to_be_in_type_list.py", line 549, in <listcomp>
type_dict["type"]
KeyError: 'type'
It looks like the types of the columns cannot be retrieved here when a physical/ temp table is not provided.
To Reproduce
from great_expectations.core import ExpectationSuite
from great_expectations.core.batch import RuntimeBatchRequest
from great_expectations.data_context import BaseDataContext
from great_expectations.data_context.types.base import DataContextConfig
from great_expectations.data_context.types.base import InMemoryStoreBackendDefaults
from great_expectations.profile.user_configurable_profiler import UserConfigurableProfiler
def profile():
datasource_name = "something"
connection_string = "postgresql+psycopg2://postgres:postgres@localhost:5432/postgres"
dataset_name = "something"
data_context_config = get_data_context_config(datasource_name, connection_string, dataset_name)
context = BaseDataContext(project_config=data_context_config)
suite: ExpectationSuite = context.create_expectation_suite("suite", overwrite_existing=True)
batch_request = RuntimeBatchRequest(
datasource_name=datasource_name,
data_connector_name="default_runtime_data_connector",
data_asset_name=dataset_name,
runtime_parameters={'schema': 'sample_data', 'query': 'select * from sample_data.orders'},
batch_identifiers={dataset_name: dataset_name},
batch_spec_passthrough={"create_temp_table": False}, # Changing this to {"create_temp_table": True} works.
)
validator = context.get_validator(
batch_request=batch_request, expectation_suite=suite,
)
profiler = UserConfigurableProfiler(
validator,
)
expectations = profiler.build_suite().to_json_dict()['expectations']
return expectations
def get_data_context_config(datasource_name, connection_string, dataset_name):
context = DataContextConfig(
datasources={
datasource_name: {
"execution_engine": {
"class_name": "SqlAlchemyExecutionEngine",
"connection_string": connection_string,
},
"class_name": "Datasource",
"module_name": "great_expectations.datasource",
"data_connectors": {
"default_runtime_data_connector": {
"class_name": "RuntimeDataConnector",
"batch_identifiers": [
dataset_name
],
},
"default_inferred_data_connector_name": {
"class_name": "InferredAssetSqlDataConnector",
"include_schema_name": True,
},
}
}
},
store_backend_defaults=InMemoryStoreBackendDefaults(),
anonymous_usage_statistics={
"enabled": False,
}
)
return context
profile()
Expected behavior Ideally I'd like the UserConfigurableProfiler to work without a temp table. If this is not possible, a validation error should be thrown when "create_temp_table" is False.
Environment (please complete the following information):
- Operating System: MacOS
- Great Expectations Version: 0.15.0
Thanks for raising this, @KentonParton! We will review and be in touch.
Hi @KentonParton - thank you for your patience. I wanted to let you know that we'll be figuring out prioritization for this over the next few days. I also wanted to let you know that we are in the process of doing some great work on our Rule Based Profilers and Data Assistants, and so you may find some success using the Onboarding Assistant as an alternative to the UserConfigurableProfiler. You can read about the Onboarding Assistant here.
Sorry for not responding to this @talagluck. I will give the OnboardingDataAssistant a try and report back.
Hey @KentonParton did the onboarding assistants address the issue?