dagster
dagster copied to clipboard
[docs] - Include version matching requirement for Great Expectations in docs
Summary
The Great Expectations guide/docs should mention that the version of GE used to create the expectations suite must match the version installed in the Dagster environment. See the Slack convo below for additional context.
Conversation excerpt
This issue was generated from the slack conversation at: https://dagster.slack.com/archives/CCCR6P2UR/p1616621279239700?thread_ts=1616621279.239700&cid=CCCR6P2UR
Conversation thread
U01SQELJNU9: Hey! Im trying to integrate great expectations but the example is not really clear to me. Is there any other resources where I can check examples? UULA0R2LV: hi have you checked out the example docs:GE_PROJECT_DIR = file_relative_path(file, "../great_expectations") GE_DATASOURCE_NAME = "pandasdata"
ge_data_context_conf = ge_data_context.configured({"ge_root_dir": GE_PROJECT_DIR})
local_mode = ModeDefinition( name="local", resource_defs={ "ge_data_context": ge_data_context_conf, }, )
@lambda_solid def continue_if_validated(df: DataFrame, expectation) -> DataFrame: if expectation["success"]: return df else: raise ValueError("GE Validation for dataset failed, see previous step")
def load_table_factory(table_name): check.str_param(table_name, "table_name")
@composite_solid(
name=f"load_table_{table_name}",
config_schema={
"table_name": Field(
str,
is_required=False,
default_value=table_name,
),
},
config_fn=load_config,
)
def load_table_solid():
"""Download table, validate if great expectation suite exists, load it to database"""
ge_suite_name = f"{table_name}.fail"
validate_solid = ge_validation_solid_factory(
name=f"validate_{table_name}",
datasource_name=GE_DATASOURCE_NAME,
suite_name=ge_suite_name,
)
continue_load_solid = continue_if_validated.alias(
f"continue_if_validated_{table_name}"
)
df = download_table_solid()
return load_df_to_db_solid(continue_load_solid(df, validate_solid(df)))
return load_table_solid
U01Q2QJHJFL: grabbed pieces from a few different pieces of code - but this should give you an idea
U01Q2QJHJFL: that factory makes a composite solid that downloads the data, validates it with GE, then if its valid will load it
U01Q2QJHJFL: cut out some extraneous stuff so not sure if this snipped 100% works
U01SQELJNU9: Thanks a lot! I will try it ou
U01SQELJNU9: <@UDJ0NL1LY> I was about to open a new issue, but after trying the code <@U01Q2QJHJFL> provided, I still got the same error. Just to be sure, I downgraded my dagster-ge package even further to the latest 0.10 version. Still, the same error about checkpoint_store_error. Apparently there is something wrong with my code and I'm not sure what it is
U01SQELJNU9:
Path to the great expectations folder in the local directory
ge_project_dir = file_relative_path(file, "./great_expectations")
Data source name in the expectation suite
ge_datasource_name = "test_datasource"
Data context configuration of the root directory
ge_data_context_conf = ge_data_context.configured({"ge_root_dir": ge_project_dir})
Basic mode definition for the great expectations data context configuration
basic_mode = ModeDefinition( name="basic", resource_defs={ "ge_data_context": ge_data_context_conf, }, )
Definition of the great expectations validation solid
ge_validate_CAMPR3 = ge_validation_solid_factory( name="ge_validation_solid", datasource_name=ge_datasource_name, suite_name="test_suite", )
Pipeline definition
@pipeline( # The following lines are needed for the great expectations integration mode_defs=[basic_mode], ) def CAMPR3_pipeline():
ids = retrieve_CAMP_ids()
df = scrape_CAMPs(ids)
validate_and_save(df, ge_validate_CAMPR3(df))
This is all the code related to the great expectations integration, but can't seem to find the error related to the "checkpoint_store_error"
UDJ0NL1LY: does the string `checkpoint_store_name` appear in your validation suite?
UDJ0NL1LY: my hunch is this is a version mismatch between the version of GE that created the validation suite and the version that's installed in your dagster environment
U01SQELJNU9: Yep, this string appears in my great_expectations.yml file: "checkpoint_store_name: checkpoint_store"
U01SQELJNU9: I will check if thats the case. brb
U01SQELJNU9: That was exactly the error! Thanks a lot <@UDJ0NL1LY>. The 0.13.4 version seems to be broken, but 0.12.10 works.
U01SQELJNU9: Thanks a lot!
UDJ0NL1LY: happy to help!
UDJ0NL1LY: <@U018K0G2Y85> docs mention that the version of GE used to create the expectations suite must match the version installed in the dagster environment
</details>
---
#### Message from the maintainers:
Are you looking for the same documentation content? Give it a :thumbsup:. We factor engagement into prioritization.