airflow-provider-great-expectations
airflow-provider-great-expectations copied to clipboard
Build data_context object in `__init__()` and not in `execute` method
Right now, the self.data_context
object is initialized within the execute
method of the airflow BaseOperator
.
This is done in: https://github.com/astronomer/airflow-provider-great-expectations/blob/0863df8edc0d4fbafc8614d28af3a1317ba255c7/great_expectations_provider/operators/great_expectations.py#L586
However, this makes impossible to interact with the data context before or after the execution.
If this self.data_context
is initiated in the __init__()
method, the user could interact with this object in the pre_execute()
or post_execute()
methods of airflow BaseOperator
.
A possible use case, for example, is to add ExpectationsSuites on runtime using an InMemoryStoreBackend
Expectation store?
def pre_execute(self, context: Any):
"""
Create and add an expectation suite to the in-memory DataContext.
"""
suite = self.data_context.create_expectation_suite(suite_name=suite_name, overwrite_existing=True)
# Add expectations
# Here we'll add a simple expectation as an example
suite.add_expectation(
expectation_type="expect_table_row_count_to_be_between",
kwargs={
"min_value": 1,
"max_value": 1000000
}
)
# Save the suite to the DataContext's in-memory expectations store
self.data_context.save_expectation_suite(suite)