great_expectations icon indicating copy to clipboard operation
great_expectations copied to clipboard

Incorporating Six Sigma Methodology for Data Quality Control in Great Expectations

Open vlasvlasvlas opened this issue 3 months ago • 0 comments

Is your feature request related to a problem? Please describe. Currently, there is no explicit support or mention of using Six Sigma methodology within Great Expectations for quality assurance purposes. This makes it challenging for users who wish to apply Six Sigma principles to their data quality control processes.

Describe the solution you'd like I would like to see built-in support or documentation in Great Expectations for implementing Six Sigma methodology to assess and monitor data quality. This could include guidance on defining expectations, calculating defect rates, and interpreting results in terms of Six Sigma levels.

Describe alternatives you've considered One alternative would be to manually implement Six Sigma calculations outside of Great Expectations, but this would be less integrated and less automated.

Additional context By incorporating Six Sigma support into Great Expectations, users would have a comprehensive toolset for managing data quality, aligned with industry-standard quality control practices. This would enhance the utility and versatility of Great Expectations for a wider range of users and use cases.

Example For instance, let's say we have a dataset representing customer orders in an e-commerce platform. We define expectations within Great Expectations to ensure that order timestamps are within a reasonable range, order amounts are non-negative, and customer addresses are valid. After running these expectations, we calculate a Six Sigma value based on the defect rates found in the data.

Suppose the resulting Six Sigma value is 3.5. This indicates that our data quality is reasonably good, with a defect rate of approximately 233 defects per million opportunities. Over time, as we continue to refine our data pipelines and improve data quality, we aim to see the Six Sigma value increase, indicating fewer defects and higher data quality. By monitoring this value regularly, we can track the effectiveness of our data quality improvement efforts and ensure that our data processes are meeting the desired quality standards.

Related links: https://docs.oracle.com/cd/B31080_01/doc/owb.102/b28223/concept_data_quality.htm

vlasvlasvlas avatar Mar 28 '24 14:03 vlasvlasvlas