great_expectations
great_expectations copied to clipboard
Error getting partial unexpected counts using expect_compound_columns_to_be_unique
Describe the bug
When running the expectation expect_compound_columns_to_be_unique
, failed executions save the string "partial_exception_counts requires a hashable type"
to partial_unexpected_counts
key in the validation results json. This causes a failure in the docs building step as a string does not have the get method. This issue might be related to issue https://github.com/great-expectations/great_expectations/issues/1994 and PR https://github.com/great-expectations/great_expectations/pull/2074 which fixes it.
To Reproduce Steps to reproduce the behavior:
- Create the following dataset
id | foreign_key_1 | foreign_key_2 | start_date |
---|---|---|---|
1 | 1 | 1 | 2021-01-01 |
2 | 1 | 2 | 2021-01-01 |
3 | 1 | 2 | 2021-01-02 |
4 | 1 | 2 | 2021-01-02 |
5 | 1 | 3 | 2021-01-01 |
- Create the expectation
{
"data_asset_type": null,
"expectation_suite_name": "expect_no_duplicate_rows",
"expectations": [
{
"expectation_type": "expect_compound_columns_to_be_unique",
"kwargs": {
"column_list": [
"foreign_key_1",
"foreign_key_2",
"start_date"
]
},
"meta": {}
}
],
"ge_cloud_id": null,
"meta": {
"great_expectations_version": "0.14.1"
}
}
- Create an expectation running the above expectation on the dataset from step 1.
- Running this checkpoint will result in a failed evaluation as rows 3 and 4 are duplicates.
- Docs Build step of checkpoint evaluation fails trying to render a table of counts on unexpected values because the string value
"partial_expectation_counts requires a hashable type"
is present instead of a dictionary.
Current Output
The output I am calling out is shown here in the validation results for the above created expectation. Specifically, see value defined under the key "partial_unexpected_counts"
.
"result": {
"element_count": 5,
"missing_count": 0,
"missing_percent": 0.0,
"partial_unexpected_counts": [
"partial_exception_counts requires a hashable type"
],
"partial_unexpected_index_list": null,
"partial_unexpected_list": [
{
"forign_key_1": 1,
"forign_key_2": 2,
"start_date": "2021-01-02"
},
{
"forign_key_1": 1,
"forign_key_2": 2,
"start_date": "2021-01-02"
}
],
"unexpected_count": 2,
"unexpected_percent": 40.0,
"unexpected_percent_nonmissing": 40.0,
"unexpected_percent_total": 40.0
},
Desired Output I would love to see some sort of value that can be rendered into the data docs for failed tests. In this reproduction example, I suppose it would look like
"result": {
"element_count": 5,
"missing_count": 0,
"missing_percent": 0.0,
"partial_unexpected_counts": [
{
"value": {"foreign_key_1": 1, "foreign_key_2": 2, "start_date": "2021-01-01"},
"counts": count
],
"partial_unexpected_index_list": null,
"partial_unexpected_list": [
{
"foreign_key_1": 1,
"foreign_key_2": 2,
"start_date": "2021-01-02"
},
{
"foreign_key_1": 1,
"foreign_key_2": 2,
"start_date": "2021-01-02"
}
],
"unexpected_count": 2,
"unexpected_percent": 40.0,
"unexpected_percent_nonmissing": 40.0,
"unexpected_percent_total": 40.0
},
This would show the list values that are duplicated well as the row counts for that set of values.
Environment (please complete the following information): This issue was found and tested using:
- MacOS and Linux (Airflow environment through docker)
- Great Expectations Version: 0.14.1
Additional context Having this fix would allow us to more quickly discover which where duplicates exist in the affected tables.
I hope that this is a new/unique issue. I was only able to find the issues I linked in the first section, but please feel free to close/delete this issue if there is something else already a related issue open.
Thanks for opening this issue, @twhitaker-entrata - we will review and be in touch!
hi @twhitaker-entrata -- thanks again for surfacing this issue. We expect this to be resolved when #4336 merges, hopefully in time for this week's release.
Hello @joshua-stauffer any idea when this will be fixed? thanks a lot
Hi @joshua-stauffer , any updates on this? I have the same problem on v0.15.13. thx
Is this issue still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?
This issue has been automatically marked as stale because it has not had recent activity.
It will be closed if no further activity occurs. Thank you for your contributions 🙇
Hi @kuhnen, @nicolasaraceni, @YevgeniyaLee - I'm having a hard time replicating the issue described with the latest version of Great Expectations (including @joshua-stauffer 's fix in #4336). Here is an exploratory test that tries to duplicate this issue: https://github.com/great-expectations/great_expectations/compare/b/dx-9/testing_if_issue_still_exists If you have a dataset or other instructions to replicate please let me know! Or if you would like to contribute a test that would replicate the issue that we are seeing so we can make sure it stays fixed that would be super helpful 🙇
@anthonyburdi to be honest I do not remember when I stop having this issue on my side. Sorry, but I am not able to provide any useful information. I will try to find on my code and tickets any information about this issue.