great_expectations icon indicating copy to clipboard operation
great_expectations copied to clipboard

Applying AI and ML to creating and suggesting expectations

Open carlsonp opened this issue 2 years ago • 3 comments

Is your feature request related to a problem? Please describe. One benefit of manually identifying and creating expectations for datasets is the human element. We're generally pretty good at identifying what constraints and expectations make sense for columns and the entire dataset. However, there may be cases where the same expectations applied to columns are replicated across tables and datasets. There doesn't appear to be a way of defining at a higher level expectations that cross datasets (maybe I'm wrong here?). A downside to manually going in and defining expectations is that it can be slow. If you have lots of datasets at a large institution, it may be impossible to go through all the datasets and define expectations for every one.

Describe the solution you'd like One potential avenue could be leveraging Artificial Intelligence and Machine Learning capabilities for the purposes of suggesting expectations which then could be executed or presented to the user to validate and confirm. Or, perhaps something that could help augment the automated data profiler that helps write tests for you?

Describe alternatives you've considered I've done some searching around but haven't found too much yet. It's difficult to separate similar questions revolving around the impact of poor data quality upon the accuracy of machine learning models that are deployed to production. In this case, I'm proposing using machine learning to help inform data quality rules themselves. Are there any folks who have separate spin-off projects leveraging Great Expectations as an execution engine to do anything similar?

Additional context None

carlsonp avatar Dec 23 '21 23:12 carlsonp

Hi @carlsonp - that's a really interesting suggestion! I think the way to do this would be by sub-classing either the Rule-Based Profiler, or its components (DomainBuilders, ParameterBuilders, ExpectationConfigurationBuilders). I know we'll be doing some more work on Rule-Based Profilers in the coming quarter - I don't know whether we'll be able to prioritize this specifically, but I'll raise it with the team. We would certainly love a contribution in the space, and are happy to offer guidance or support if you are interested.

talagluck avatar Dec 27 '21 16:12 talagluck

Hey @carlsonp ! We currently have a lot of motion around Rule-Based Profilers. If you happen to take a look at that and see a place to contribute, we would love to work through a contribution with you.

austiezr avatar Feb 09 '22 17:02 austiezr

Is this issue still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?

This issue has been automatically marked as stale because it has not had recent activity.

It will be closed if no further activity occurs. Thank you for your contributions 🙇

github-actions[bot] avatar Aug 05 '22 02:08 github-actions[bot]

Feature request has been added for Product management's review.

rdodev avatar Mar 07 '23 19:03 rdodev