OpenMetadata icon indicating copy to clipboard operation
OpenMetadata copied to clipboard

Data quality rule joining multiple data sources.

Open upenbendre opened this issue 2 years ago • 3 comments

Is your feature request related to a problem? Please describe. If I have two different data service e.g. 2 Postgres DBs is it possible to have a data quality test case or rule that checks the relation between fields of these two DBs. E.g. Value of column X in table 1 of service A should not exceed Value of column Y in table 2 of service B. The sources could be diverse as well. E.g. Postgres joined with Kafka.

Describe the solution you'd like A clear and concise description of what you want to happen.

Describe alternatives you've considered I may have to create a datasource using something like Presto that joins multiple data sources, and create such a rule on it. But since Presto may not support some data sources like Db2 properly, this may not work. Another option is dbt, but that is an overkill if the data quality rule joining two sources, that I want, is fairly simple - which will mostly be the case.

Additional context Thanks. The context here is that we are looking to keep all our data quality rules central. Easier to govern and spot if any group or service deviated from the overarching business rule. Or broke a pattern.

upenbendre avatar Sep 01 '22 08:09 upenbendre

@upenbendre you can build a sql query to compare this right? We already have support for custom sql test

harshach avatar Sep 11 '22 20:09 harshach

I haven't found documentation on how to create a custom sql test

upenbendre avatar Sep 18 '22 17:09 upenbendre

Looking for documentation on how to create a custom SQL test that joins multiple data sources

upenbendre avatar Sep 21 '22 07:09 upenbendre

@upenbendre here is the link to the documentation for the custom SQL query test. https://docs.open-metadata.org/openmetadata/ingestion/workflows/data-quality/tests#table-custom-sql-test.

We'll try to add native support for this test in 0.13, though we won't support federated DQ across different services (i.e. value of column A in Bigquery compared to value of column B in Postgres).

TeddyCr avatar Sep 30 '22 06:09 TeddyCr

I got a "Page not Found" on https://docs.open-metadata.org/openmetadata/ingestion/workflows/data-quality/tests#table-custom-sql-test

upenbendre avatar Oct 26 '22 08:10 upenbendre

@upenbendre the link has moved. Here it is 👉 https://docs.open-metadata.org/connectors/ingestion/workflows/data-quality/tests#table-custom-sql-test

TeddyCr avatar Oct 26 '22 08:10 TeddyCr