OpenMetadata
OpenMetadata copied to clipboard
dbt Enhancements
[1.3.2] - JSON Schema & Parsing Improvements
- [x] Add
type
fields in the json schema of each individual config of dbt (local, http, s3, gcs, azure, cloud) - [x] Handle the appropriate error in case of type mismatches
[1.4] - Tags & Glossaries
- https://github.com/open-metadata/OpenMetadata/issues/9031
- https://github.com/open-metadata/OpenMetadata/issues/13293
We currently sync dbt tags into OM by creating a new classification DBTTags
and adding all that info as tags inside.
What we need to figure out here is a way to directly link dbt
tags into existing tags/tiers/glossaries in OpenMetadata. Example:
tables:
- name: DATA_TABLE
description: Data ,
columns:
- name: gross_revenue
description: column description
meta:
openmetadata:
# DO NOT create anything new in OM, just link to existing items
- type: GlossaryTerm
name: BusinessGlossary.GrossRevenue
- type: Classification
name: Tier.Tier1
[1.4.1] - dbt run details
- https://github.com/open-metadata/OpenMetadata/issues/14065
We need to figure out how to link datamodels with relevant information like:
- past executions & status
- last time it was refreshed
We have 2 different topics here:
- How to integrate dbt cloud metadata (schedule, runs, jobs, etc.) -> This can become a new Pipeline Service
- How to figure out - when using dbt core - when each model was refreshed etc. (https://docs.getdbt.com/reference/artifacts/run-results-json) -> These are extra properties to add to a
DataModel
Entity, be it adbt
model, orDashboardDataModel
: addlastRefreshed
,executions
(similar toPipeline
Entity executions)
Create this as a Pipeline, show status, link the last status in the table, use Incident manager to track these pipeline status
[1.5] - Semantic Layer
- https://github.com/open-metadata/OpenMetadata/issues/12911
How to integrate GENERALLY "Semantic Layer" data, be it from dbt metrics/exposures, Tableau Metrics, etc.
- Would it make sense to add a "Metric" field to
GlossaryTerm
Entity that can be an SQL expression, Python code that computes the metrics, etc.
[1.5] - dbt Hooks
- Can we build a GitHub action that validates changes in dbt vs. metadata in OM?
[1.3.2] - Documentation
- https://github.com/open-metadata/OpenMetadata/issues/7113
- [x] remove https://docs.open-metadata.org/v1.3.x/sdk/python/ingestion/dbt
- [x] Follow the approach as with any connector (UI vs. run externally with new step-wise components)
Backlog
- [ ] P2 - For dbt cloud, the metadata required can be gathered using a metadata API from dbt cloud. Check if it is feasible to implement it instead of current approach of getting dbt artifacts.
- [ ] P2 https://github.com/open-metadata/OpenMetadata/issues/12910
- [ ] P2 https://github.com/open-metadata/OpenMetadata/issues/12095
- [ ] P1 https://github.com/open-metadata/OpenMetadata/issues/16050
- [ ] (Collate only) Reverse metadata: Open a PR to the repo of the dbt project to update descriptions based on OM descriptions