great_expectations icon indicating copy to clipboard operation
great_expectations copied to clipboard

Run checkpoint is not throwing an error when expected.

Open jeduardo90 opened this issue 2 years ago • 3 comments

Description:

When running a checkpoint within the V3 API, first I need to validate table metadata, and I only want to proceed to column validations if all the table validation results are successful. If not, I want the validations to stop and to produce a results object with the results I have so far.

To get the desired output, I tried manipulating the catch_exceptions flag in the expectation suite json, but to no avail.

The checkpoint is not outputting any traceback info, neither on runtime nor on the results object, regardless of the value in the catch_exceptions flag.

"*" The most common flaw resulting from this, is getting an index out of bounds error when I'm trying to validate a column that doesn't exist (validations should halt on expect_table_columns_to_match_ordered_list, because that column is not present in the first place.)

My workaround is to manipulate the expectation suite on runtime, splitting the expectation suite in two sets: Table expectations and column expectations, and add conditional logic to validate table and column expectations separately. This is less than desirable.

Replicate the issue:

I've executed the following code with the attached suite and file:

import sys
from great_expectations.checkpoint.types.checkpoint_result import CheckpointResult
from great_expectations.core.batch import RuntimeBatchRequest
from great_expectations.data_context import DataContext
import great_expectations as ge
import pandas as pd

file = r"testfile.csv" ## file contained in the attached zip file
separator = ";"

df = pd.read_csv(file, sep=separator)

data_context: DataContext = ge.get_context()

batch_request = RuntimeBatchRequest(
    datasource_name="pandas_dataframe",
    data_connector_name="s3_pandas_connector",
    data_asset_name="asset_name",  # this can be anything that identifies this data
    runtime_parameters={"batch_data": df},
    batch_identifiers={"filename": "testfile.csv",
                        "data_domain": "test"
                    },
)

result: CheckpointResult = data_context.run_checkpoint(
    checkpoint_name="cde_datalake_raw_checkpoint",
    batch_request=batch_request,
    run_name="run_name",
    expectation_suite_name="testfile",
    runtime_configuration={"catch_exceptions": False}
    
)

if not result["success"]:
    print("Validation failed!")
    sys.exit(1)

print("Validation succeeded!")
sys.exit(0)

Expected result: Stop validations if catch_exception is false and return a checkpoint result object with the results up to that point.

Environment (please complete the following information):

  • Operating System: Linux and Windows
  • Great Expectations Version: [e.g. 0.13.44]

EDIT: Added line beginning with * for more context.

GE Configs.zip

jeduardo90 avatar Dec 16 '21 18:12 jeduardo90

Hey @jeduardo90 thanks for opening up this issue! I'll bring this up with our team and get back to you.

cdkini avatar Dec 17 '21 16:12 cdkini

Howdy @jeduardo90, rather than attaching the configs, can you code block them for us please?

AFineDayFor avatar Jan 24 '22 18:01 AFineDayFor

Is this issue still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?

This issue has been automatically marked as stale because it has not had recent activity.

It will be closed if no further activity occurs. Thank you for your contributions 🙇

github-actions[bot] avatar Aug 05 '22 02:08 github-actions[bot]