great_expectations icon indicating copy to clipboard operation
great_expectations copied to clipboard

[Feature] Optionally Show (Partial) Unexpected Rows in Data Docs

Open jdimatteo opened this issue 2 years ago • 3 comments

Is your feature request related to a problem? Please describe.

Sometimes it would be useful for Great Expectations to show full rows of unexpected data so that a user has enough context to more quickly follow up on fixing the data problem.

For some kinds of validation, a typical workflow is:

  1. run Great Expectations to produce a Data Docs report
  2. investigate expectation failures by querying the tables to better understand the context

If the data was directly included, this might speed up fixing the problems by eliminating the need to query the data.

Describe the solution you'd like

It would help if full rows were included in the Data Docs report and the ExpectationValidationResult, using the same partial_unexpected_count to control the limit of how many rows are displayed. This might be named partial_unexpected_rows in the ExpectationValidationResult and might show up in the Data Docs like this:

image

Showing these unexpected rows in the Data Docs and in the ExpectationValidationResult might be configurable separately, e.g. using the same example detailed in #4170:

(venv) jdimatteo@erel:~/dev/great_expectations_fork_2/feature_example$ git diff -U4
diff --git a/feature_example/great_expectations_example.py b/feature_example/great_expectations_example.py
index f8caff28c..3c785ae2d 100644
--- a/feature_example/great_expectations_example.py
+++ b/feature_example/great_expectations_example.py
@@ -22,8 +22,9 @@ data_context_config = DataContextConfig(
     anonymous_usage_statistics={"enabled": False},
     data_docs_sites={
         "local_site": {
             "class_name": "SiteBuilder",
+            "show_partial_unexpected_rows": True,
             "show_how_to_buttons": False,
             "store_backend": {
                 "class_name": "TupleFilesystemStoreBackend",
                 "base_directory": os.path.join(
@@ -87,9 +88,9 @@ checkpoint = context.add_checkpoint(
 results = checkpoint.run(
     result_format={
         "result_format": "COMPLETE",
         "partial_unexpected_count": 20,
-        "include_unexpected_rows": True,
+        "include_partial_unexpected_rows": True,
     }
 )
 print("All expectations succeeded?", results.success)
 suite_result: ExpectationSuiteValidationResult = list(results.run_results.items())[0][
(venv) jdimatteo@erel:~/dev/great_expectations_fork_2/feature_example$ 

Describe alternatives you've considered

We've considered also #4181 but we think a combination of both would be ideal so that a user could see a partial unexpected list of rows while also seeing the full query to enable faster further analysis with queries.

Additional context

We intend to use this feature in combination with #4170 and #4181 and #4186, and the example here builds off the example detailed in #4170.

jdimatteo avatar Feb 11 '22 07:02 jdimatteo

Hey @jdimatteo ! Thanks for reaching out with these; a lot of super interesting functionality here. We'll review internally over the next week and continue the conversation.

austiezr avatar Feb 11 '22 16:02 austiezr

Hey @jdimatteo, We have reviewed this item and added it to our feature roadmap. We do not have an estimated time to start work on this, but we will notify you when we do.

kyleaton avatar Aug 29 '22 14:08 kyleaton

Varying pieces of this have been implemented by the recent ID/PK work that we've done, as well as in the include_unexpected_rows feature, so I'm going to close this for now, but please feel free to open a new Issue if more specific functionality has not been captured.

talagluck avatar Feb 09 '23 15:02 talagluck