great_expectations icon indicating copy to clipboard operation
great_expectations copied to clipboard

UserConfigurableProfiler fails when RuntimeBatchRequest uses query and "create_temp_table": False

Open KentonParton opened this issue 2 years ago • 3 comments

Describe the bug When using the UserConfigurableProfiler and a RuntimeBatchRequest where "runtime_parameters" uses a query like {'schema': 'sample_data', 'query': 'select * from sample_data.orders'} and "batch_spec_passthrough" = {"create_temp_table": False} the following exception is thrown:

  File "/Users/myuser/code/ge/venv/lib/python3.9/site-packages/great_expectations/expectations/core/expect_column_values_to_be_in_type_list.py", line 549, in <listcomp>
    type_dict["type"]
KeyError: 'type'

It looks like the types of the columns cannot be retrieved here when a physical/ temp table is not provided.

To Reproduce

from great_expectations.core import ExpectationSuite
from great_expectations.core.batch import RuntimeBatchRequest
from great_expectations.data_context import BaseDataContext
from great_expectations.data_context.types.base import DataContextConfig
from great_expectations.data_context.types.base import InMemoryStoreBackendDefaults
from great_expectations.profile.user_configurable_profiler import UserConfigurableProfiler


def profile():
    datasource_name = "something"
    connection_string = "postgresql+psycopg2://postgres:postgres@localhost:5432/postgres"
    dataset_name = "something"

    data_context_config = get_data_context_config(datasource_name, connection_string, dataset_name)
    context = BaseDataContext(project_config=data_context_config)
    suite: ExpectationSuite = context.create_expectation_suite("suite", overwrite_existing=True)

    batch_request = RuntimeBatchRequest(
        datasource_name=datasource_name,
        data_connector_name="default_runtime_data_connector",
        data_asset_name=dataset_name,
        runtime_parameters={'schema': 'sample_data', 'query': 'select * from sample_data.orders'},
        batch_identifiers={dataset_name: dataset_name},
        batch_spec_passthrough={"create_temp_table": False},  # Changing this to {"create_temp_table": True} works.
    )

    validator = context.get_validator(
        batch_request=batch_request, expectation_suite=suite,
    )

    profiler = UserConfigurableProfiler(
        validator,
    )
    expectations = profiler.build_suite().to_json_dict()['expectations']

    return expectations


def get_data_context_config(datasource_name, connection_string, dataset_name):
    context = DataContextConfig(
        datasources={
            datasource_name: {
                "execution_engine": {
                    "class_name": "SqlAlchemyExecutionEngine",
                    "connection_string": connection_string,
                },
                "class_name": "Datasource",
                "module_name": "great_expectations.datasource",
                "data_connectors": {
                    "default_runtime_data_connector": {
                        "class_name": "RuntimeDataConnector",
                        "batch_identifiers": [
                            dataset_name
                        ],
                    },
                    "default_inferred_data_connector_name": {
                        "class_name": "InferredAssetSqlDataConnector",
                        "include_schema_name": True,
                    },
                }
            }
        },
        store_backend_defaults=InMemoryStoreBackendDefaults(),
        anonymous_usage_statistics={
            "enabled": False,
        }
    )
    return context


profile()

Expected behavior Ideally I'd like the UserConfigurableProfiler to work without a temp table. If this is not possible, a validation error should be thrown when "create_temp_table" is False.

Environment (please complete the following information):

  • Operating System: MacOS
  • Great Expectations Version: 0.15.0

KentonParton avatar Apr 09 '22 19:04 KentonParton

Thanks for raising this, @KentonParton! We will review and be in touch.

talagluck avatar Apr 11 '22 16:04 talagluck

Hi @KentonParton - thank you for your patience. I wanted to let you know that we'll be figuring out prioritization for this over the next few days. I also wanted to let you know that we are in the process of doing some great work on our Rule Based Profilers and Data Assistants, and so you may find some success using the Onboarding Assistant as an alternative to the UserConfigurableProfiler. You can read about the Onboarding Assistant here.

talagluck avatar Aug 10 '22 08:08 talagluck

Sorry for not responding to this @talagluck. I will give the OnboardingDataAssistant a try and report back.

KentonParton avatar Sep 05 '22 21:09 KentonParton

Hey @KentonParton did the onboarding assistants address the issue?

rdodev avatar Mar 08 '23 15:03 rdodev