airflow-provider-great-expectations icon indicating copy to clipboard operation
airflow-provider-great-expectations copied to clipboard

Build data_context object in `__init__()` and not in `execute` method

Open Salias opened this issue 2 months ago • 0 comments

Right now, the self.data_context object is initialized within the execute method of the airflow BaseOperator.

This is done in: https://github.com/astronomer/airflow-provider-great-expectations/blob/0863df8edc0d4fbafc8614d28af3a1317ba255c7/great_expectations_provider/operators/great_expectations.py#L586

However, this makes impossible to interact with the data context before or after the execution.

If this self.data_context is initiated in the __init__() method, the user could interact with this object in the pre_execute() or post_execute() methods of airflow BaseOperator.

A possible use case, for example, is to add ExpectationsSuites on runtime using an InMemoryStoreBackend Expectation store?

    def pre_execute(self, context: Any):
    """
    Create and add an expectation suite to the in-memory DataContext.
    """
        suite = self.data_context.create_expectation_suite(suite_name=suite_name, overwrite_existing=True)
        
        # Add expectations
        # Here we'll add a simple expectation as an example
        suite.add_expectation(
            expectation_type="expect_table_row_count_to_be_between",
            kwargs={
                "min_value": 1,
                "max_value": 1000000
            }
        )

        # Save the suite to the DataContext's in-memory expectations store
       self.data_context.save_expectation_suite(suite)

Salias avatar May 02 '24 15:05 Salias