great_expectations expect_column_mean_to_be_between expectation unsuccessful for empty DataFrame

Describe the bug The expect_column_mean_to_be_between expectations fails if used together with a row_condition that evaluates False for all rows. This is inconsistent with other expectations.

To Reproduce:

In [1]: import great_expectations as ge
   ...: import pandas as pd
   ...: 
   ...: df = pd.DataFrame({"i": [0, 1, 0, 1], "y": ["a", "a", "a", "a"]})
   ...: 
   ...: expectations = [
   ...:     {
   ...:         "expectation_type": "expect_column_values_to_not_be_null",
   ...:         "kwargs": {
   ...:             "column": "i",
   ...:             "row_condition": "y == 'b'",
   ...:             "condition_parser": "pandas",
   ...:         },
   ...:     },
   ...:     {
   ...:         "expectation_type": "expect_column_values_to_be_null",
   ...:         "kwargs": {
   ...:             "column": "i",
   ...:             "row_condition": "y == 'b'",
   ...:             "condition_parser": "pandas",
   ...:         },
   ...:     },
   ...:     {
   ...:         "expectation_type": "expect_column_values_to_be_in_set",
   ...:         "kwargs": {
   ...:             "column": "i",
   ...:             "value_set": [0, 1],
   ...:             "row_condition": "y == 'b'",
   ...:             "condition_parser": "pandas",
   ...:         },
   ...:     },
   ...:     {
   ...:         "expectation_type": "expect_column_mean_to_be_between",
   ...:         "kwargs": {
   ...:             "column": "i",
   ...:             "min_value": 0,
   ...:             "max_value": 1,
   ...:             "row_condition": "y == 'b'",
   ...:             "condition_parser": "pandas",
   ...:         },
   ...:     },
   ...: ]
   ...: 
   ...: expectation_suite = ge.core.ExpectationSuite(
   ...:     "expectation_suite",
   ...:     expectations=[ge.core.ExpectationConfiguration(**e) for e in expectations],
   ...: )
   ...: validation_results = ge.from_pandas(df).validate(expectation_suite)
   ...: 
   ...: [result.success for result in validation_results.results]
Out[1]: [True, True, True, False]

Expected Behaviour: The expect_column_mean_to_be_between expectation should succeed like other expectations.

Environment (please complete the following information):

 ~ $ conda list great-expectations
# packages in environment at /home/mlondschien/anaconda3/envs/quantcore.thek:
#
# Name                    Version                   Build  Channel
great-expectations        0.13.19            pyha770c72_0    conda-forge

Additional context We apply the same expectation suite on different subsets of a large table. Some expectations do not make sense for a specific subset, so we use the row_condition to filter these.

Details:

In [2]: validation_results Out[2]: { "statistics": { "evaluated_expectations": 4, "successful_expectations": 3, "unsuccessful_expectations": 1, "success_percent": 75.0 }, "results": [ { "expectation_config": { "expectation_type": "expect_column_values_to_not_be_null", "kwargs": { "column": "i", "row_condition": "y == 'b'", "condition_parser": "pandas" }, "meta": {} }, "result": { "element_count": 0, "unexpected_count": 0, "unexpected_percent": null, "partial_unexpected_list": [] }, "success": true, "meta": {}, "exception_info": { "raised_exception": false, "exception_message": null, "exception_traceback": null } }, { "expectation_config": { "expectation_type": "expect_column_values_to_be_null", "kwargs": { "column": "i", "row_condition": "y == 'b'", "condition_parser": "pandas" }, "meta": {} }, "result": { "element_count": 0, "unexpected_count": 0, "unexpected_percent": null, "partial_unexpected_list": [] }, "success": true, "meta": {}, "exception_info": { "raised_exception": false, "exception_message": null, "exception_traceback": null } }, { "expectation_config": { "expectation_type": "expect_column_values_to_be_in_set", "kwargs": { "column": "i", "value_set": [ 0, 1 ], "row_condition": "y == 'b'", "condition_parser": "pandas" }, "meta": {} }, "result": { "element_count": 0, "missing_count": 0, "missing_percent": null, "unexpected_count": 0, "unexpected_percent": null, "unexpected_percent_nonmissing": null, "partial_unexpected_list": [] }, "success": true, "meta": {}, "exception_info": { "raised_exception": false, "exception_message": null, "exception_traceback": null } }, { "expectation_config": { "expectation_type": "expect_column_mean_to_be_between", "kwargs": { "column": "i", "min_value": 0, "max_value": 1, "row_condition": "y == 'b'", "condition_parser": "pandas" }, "meta": {} }, "result": { "observed_value": null, "element_count": 0, "missing_count": null, "missing_percent": null }, "success": false, "meta": {}, "exception_info": { "raised_exception": false, "exception_message": null, "exception_traceback": null } } ], "evaluation_parameters": {}, "success": false, "meta": { "great_expectations_version": "0.13.2", "expectation_suite_name": "expectation_suite", "run_id": { "run_time": "2021-04-26T08:14:57.037509+00:00", "run_name": null }, "batch_kwargs": { "ge_batch_id": "7cc86ff4-a667-11eb-9e90-482ae30df8e3" }, "batch_markers": {}, "batch_parameters": {}, "validation_time": "20210426T081457.037357Z" } }

Apr 26 '21 08:04 mlondschien

@mlondschien Thank you for reporting this!

May 03 '21 15:05 eugmandel

I hit this for expect_column_values_to_be_in_type_list.py also, so wonder if this may exist for many expectations?

Would it make sense for expectations on column values (or aggregates thereof) to exit early with success on empty dataframes?

Sep 10 '21 08:09 shearer12345

Hey @mlondschien ! After significant discussion on our philosophy in this are, we believe the behavior you're seeing to be the correct one, namely that:

Aggregate Expectations on Empty Dataframes → Fail
Aggregate Expectations with Row Conditions that return Empty Dataframes → Fail
Map Expectations on Empty Dataframes → If there are no rows, the expectation should pass
Map Expectations with Row Conditions that Return Empty Dataframes → If there are no rows, the expectation should pass

We believe this should hold true for the majority of extant expectations, and will view behavior outside of this paradigm as unexpected.

Nov 07 '22 18:11 austiezr

great_expectations great_expectations copied to clipboard

expect_column_mean_to_be_between expectation unsuccessful for empty DataFrame

great_expectations
great_expectations copied to clipboard