great_expectations icon indicating copy to clipboard operation
great_expectations copied to clipboard

Error getting partial unexpected counts using expect_compound_columns_to_be_unique

Open ghost opened this issue 2 years ago • 7 comments

Describe the bug When running the expectation expect_compound_columns_to_be_unique, failed executions save the string "partial_exception_counts requires a hashable type" to partial_unexpected_counts key in the validation results json. This causes a failure in the docs building step as a string does not have the get method. This issue might be related to issue https://github.com/great-expectations/great_expectations/issues/1994 and PR https://github.com/great-expectations/great_expectations/pull/2074 which fixes it.

To Reproduce Steps to reproduce the behavior:

  1. Create the following dataset
id foreign_key_1 foreign_key_2 start_date
1 1 1 2021-01-01
2 1 2 2021-01-01
3 1 2 2021-01-02
4 1 2 2021-01-02
5 1 3 2021-01-01
  1. Create the expectation
{
  "data_asset_type": null,
  "expectation_suite_name": "expect_no_duplicate_rows",
  "expectations": [
    {
      "expectation_type": "expect_compound_columns_to_be_unique",
      "kwargs": {
        "column_list": [
          "foreign_key_1",
          "foreign_key_2",
          "start_date"
        ]
      },
      "meta": {}
    }
  ],
  "ge_cloud_id": null,
  "meta": {
    "great_expectations_version": "0.14.1"
  }
}
  1. Create an expectation running the above expectation on the dataset from step 1.
  2. Running this checkpoint will result in a failed evaluation as rows 3 and 4 are duplicates.
  3. Docs Build step of checkpoint evaluation fails trying to render a table of counts on unexpected values because the string value "partial_expectation_counts requires a hashable type" is present instead of a dictionary.

Current Output The output I am calling out is shown here in the validation results for the above created expectation. Specifically, see value defined under the key "partial_unexpected_counts".

      "result": {
        "element_count": 5,
        "missing_count": 0,
        "missing_percent": 0.0,
        "partial_unexpected_counts": [
          "partial_exception_counts requires a hashable type"
        ],
        "partial_unexpected_index_list": null,
        "partial_unexpected_list": [
          {
            "forign_key_1": 1,
            "forign_key_2": 2,
            "start_date": "2021-01-02"
          },
          {
            "forign_key_1": 1,
            "forign_key_2": 2,
            "start_date": "2021-01-02"
          }
        ],
        "unexpected_count": 2,
        "unexpected_percent": 40.0,
        "unexpected_percent_nonmissing": 40.0,
        "unexpected_percent_total": 40.0
      },

Desired Output I would love to see some sort of value that can be rendered into the data docs for failed tests. In this reproduction example, I suppose it would look like

      "result": {
        "element_count": 5,
        "missing_count": 0,
        "missing_percent": 0.0,
        "partial_unexpected_counts": [
          {
            "value": {"foreign_key_1": 1, "foreign_key_2": 2, "start_date": "2021-01-01"}, 
            "counts": count
        ],
        "partial_unexpected_index_list": null,
        "partial_unexpected_list": [
          {
            "foreign_key_1": 1,
            "foreign_key_2": 2,
            "start_date": "2021-01-02"
          },
          {
            "foreign_key_1": 1,
            "foreign_key_2": 2,
            "start_date": "2021-01-02"
          }
        ],
        "unexpected_count": 2,
        "unexpected_percent": 40.0,
        "unexpected_percent_nonmissing": 40.0,
        "unexpected_percent_total": 40.0
      },

This would show the list values that are duplicated well as the row counts for that set of values.

Environment (please complete the following information): This issue was found and tested using:

  • MacOS and Linux (Airflow environment through docker)
  • Great Expectations Version: 0.14.1

Additional context Having this fix would allow us to more quickly discover which where duplicates exist in the affected tables.

I hope that this is a new/unique issue. I was only able to find the issues I linked in the first section, but please feel free to close/delete this issue if there is something else already a related issue open.

ghost avatar Feb 28 '22 21:02 ghost

Thanks for opening this issue, @twhitaker-entrata - we will review and be in touch!

talagluck avatar Mar 02 '22 19:03 talagluck

hi @twhitaker-entrata -- thanks again for surfacing this issue. We expect this to be resolved when #4336 merges, hopefully in time for this week's release.

joshua-stauffer avatar Mar 08 '22 21:03 joshua-stauffer

Hello @joshua-stauffer any idea when this will be fixed? thanks a lot

kuhnen avatar Apr 23 '22 05:04 kuhnen

Hi @joshua-stauffer , any updates on this? I have the same problem on v0.15.13. thx

nicolasaraceni avatar Jul 08 '22 14:07 nicolasaraceni

Is this issue still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?

This issue has been automatically marked as stale because it has not had recent activity.

It will be closed if no further activity occurs. Thank you for your contributions 🙇

github-actions[bot] avatar Aug 08 '22 02:08 github-actions[bot]

Hi @kuhnen, @nicolasaraceni, @YevgeniyaLee - I'm having a hard time replicating the issue described with the latest version of Great Expectations (including @joshua-stauffer 's fix in #4336). Here is an exploratory test that tries to duplicate this issue: https://github.com/great-expectations/great_expectations/compare/b/dx-9/testing_if_issue_still_exists If you have a dataset or other instructions to replicate please let me know! Or if you would like to contribute a test that would replicate the issue that we are seeing so we can make sure it stays fixed that would be super helpful 🙇

anthonyburdi avatar Oct 18 '22 19:10 anthonyburdi

@anthonyburdi to be honest I do not remember when I stop having this issue on my side. Sorry, but I am not able to provide any useful information. I will try to find on my code and tickets any information about this issue.

kuhnen avatar Oct 19 '22 08:10 kuhnen