dagster icon indicating copy to clipboard operation
dagster copied to clipboard

[docs] - Include version matching requirement for Great Expectations in docs

Open dagsterbot[bot] opened this issue 3 years ago • 0 comments

Summary

The Great Expectations guide/docs should mention that the version of GE used to create the expectations suite must match the version installed in the Dagster environment. See the Slack convo below for additional context.


Conversation excerpt

This issue was generated from the slack conversation at: https://dagster.slack.com/archives/CCCR6P2UR/p1616621279239700?thread_ts=1616621279.239700&cid=CCCR6P2UR

Conversation thread U01SQELJNU9: Hey! Im trying to integrate great expectations but the example is not really clear to me. Is there any other resources where I can check examples? UULA0R2LV: hi have you checked out the example docs: UULA0R2LV: we also have a blog post talking about the integration: U01SQELJNU9: Hey . Yes, I practically copied the example with my own suite and got the error: `DataTextConfig’ object has no attribute ‘validation_operators`. I wanted to check another example to see what I did wrong but theres nothing :white_frowning_face: UDJ0NL1LY: hi eduardo, what version of great expectations are you on and can we see the full error? U01SQELJNU9: Im using the ge 0.13.10 U01SQELJNU9: This is the error U01SQELJNU9: [link to image in Slack](https://files.slack.com/files-pri/TCDGQDUKF-F01SCDQUXPV/screen_shot_2021-03-24_at_5.30.25_pm.png) ``` dagster.core.errors.DagsterExecutionStepExecutionError: Error occurred while executing solid "ge_validation_solid": [...] AttributeError: 'DataContextConfig' object has no attribute 'validation_operators' ``` UDJ0NL1LY: it's possible that GE broke us with 0.13 -- can you try downgrading and see if the error persists? `pip uninstall great-expectations && pip install great-expectations--0.12.10` should do it UDJ0NL1LY: yep i think actually this broke in 0.13.10 UDJ0NL1LY: would you mind +1ing this issue: U01SQELJNU9: After downgrading the pipeline breaks even faster U01SQELJNU9: [link to image in Slack](https://files.slack.com/files-pri/TCDGQDUKF-F01T2697LHE/screen_shot_2021-03-24_at_6.51.27_pm.png) ``` great_expectations.exceptions.InvalidDataContextConfigError: ('Error while processing DataContextConfig.', ValidationError({'checkpoint_store_name': ['Unknown field.']})) ``` U01SQELJNU9: Sorry to ask such a dumb question but how do you one up the issue? UDJ0NL1LY: ah there should be a button to add a reaction but UDJ0NL1LY: if this is also broken on 0.12.x, can you open a new issue? UDJ0NL1LY: and do you have any code you feel comfortable sharing that we can use to reproduce it? U01Q2QJHJFL: fyi dagster-ge==0.11.0 is working for me with great-expectations pinned to 0.12.10 U01SQELJNU9: I will try downgrading to dagster-ge==0.11.0 then. Currently I have the 0.11.1 version U01SQELJNU9: Do you mind sharing a bit of your code so I can see how you implemented the integration? U01Q2QJHJFL: ``` from dagster.core.definitions.decorators.composite_solid import composite_solid from dagster.core.definitions.decorators.lambda_solid import lambda_solid from dagster.utils import file_relative_path from dagster import ModeDefinition. lambda_solid, solid, composite_solid from dagster_ge import ge_validation_solid_factory from dagster_ge.factory import ge_data_context

GE_PROJECT_DIR = file_relative_path(file, "../great_expectations") GE_DATASOURCE_NAME = "pandasdata"

ge_data_context_conf = ge_data_context.configured({"ge_root_dir": GE_PROJECT_DIR})

local_mode = ModeDefinition( name="local", resource_defs={ "ge_data_context": ge_data_context_conf, }, )

@lambda_solid def continue_if_validated(df: DataFrame, expectation) -> DataFrame: if expectation["success"]: return df else: raise ValueError("GE Validation for dataset failed, see previous step")

def load_table_factory(table_name): check.str_param(table_name, "table_name")

@composite_solid(
    name=f"load_table_{table_name}",
    config_schema={
        "table_name": Field(
            str,
            is_required=False,
            default_value=table_name,
        ),
    },
    config_fn=load_config,
)
def load_table_solid():
    """Download table, validate if great expectation suite exists, load it to database"""
    ge_suite_name = f"{table_name}.fail"
    validate_solid = ge_validation_solid_factory(
        name=f"validate_{table_name}",
        datasource_name=GE_DATASOURCE_NAME,
        suite_name=ge_suite_name,
    )
    continue_load_solid = continue_if_validated.alias(
        f"continue_if_validated_{table_name}"
    )
    df = download_table_solid()
    return load_df_to_db_solid(continue_load_solid(df, validate_solid(df)))

return load_table_solid
U01Q2QJHJFL: grabbed pieces from a few different pieces of code - but this should give you an idea
U01Q2QJHJFL: that factory makes a composite solid that downloads the data, validates it with GE, then if its valid will load it
U01Q2QJHJFL: cut out some extraneous stuff so not sure if this snipped 100% works
U01SQELJNU9: Thanks a lot! I will try it ou
U01SQELJNU9: <@UDJ0NL1LY> I was about to open a new issue, but after trying the code <@U01Q2QJHJFL> provided, I still got the same error. Just to be sure, I downgraded my dagster-ge package even further to the latest 0.10 version. Still, the same error about checkpoint_store_error. Apparently there is something wrong with my code and I'm not sure what it is
U01SQELJNU9: 

Path to the great expectations folder in the local directory

ge_project_dir = file_relative_path(file, "./great_expectations")

Data source name in the expectation suite

ge_datasource_name = "test_datasource"

Data context configuration of the root directory

ge_data_context_conf = ge_data_context.configured({"ge_root_dir": ge_project_dir})

Basic mode definition for the great expectations data context configuration

basic_mode = ModeDefinition( name="basic", resource_defs={ "ge_data_context": ge_data_context_conf, }, )

Definition of the great expectations validation solid

ge_validate_CAMPR3 = ge_validation_solid_factory( name="ge_validation_solid", datasource_name=ge_datasource_name, suite_name="test_suite", )

Pipeline definition

@pipeline( # The following lines are needed for the great expectations integration mode_defs=[basic_mode], ) def CAMPR3_pipeline():

ids = retrieve_CAMP_ids()
df = scrape_CAMPs(ids)
validate_and_save(df, ge_validate_CAMPR3(df))
This is all the code related to the great expectations integration, but can't seem to find the error related to the "checkpoint_store_error"
UDJ0NL1LY: does the string `checkpoint_store_name` appear in your validation suite?
UDJ0NL1LY: my hunch is this is a version mismatch between the version of GE that created the validation suite and the version that's installed in your dagster environment
U01SQELJNU9: Yep, this string appears in my great_expectations.yml file: "checkpoint_store_name: checkpoint_store"
U01SQELJNU9: I will check if thats the case. brb
U01SQELJNU9: That was exactly the error! Thanks a lot <@UDJ0NL1LY>. The 0.13.4 version seems to be broken, but 0.12.10 works.
U01SQELJNU9: Thanks a lot!
UDJ0NL1LY: happy to help!
UDJ0NL1LY: <@U018K0G2Y85> docs mention that the version of GE used to create the expectations suite must match the version installed in the dagster environment
</details>

---

#### Message from the maintainers:

Are you looking for the same documentation content? Give it a :thumbsup:. We factor engagement into prioritization.

dagsterbot[bot] avatar Mar 26 '21 18:03 dagsterbot[bot]