OpenMetadata icon indicating copy to clipboard operation
OpenMetadata copied to clipboard

dbt Enhancements

Open OnkarVO7 opened this issue 1 year ago • 0 comments

[1.3.2] - JSON Schema & Parsing Improvements

  • [x] Add type fields in the json schema of each individual config of dbt (local, http, s3, gcs, azure, cloud)
  • [x] Handle the appropriate error in case of type mismatches

[1.4] - Tags & Glossaries

  • https://github.com/open-metadata/OpenMetadata/issues/9031
  • https://github.com/open-metadata/OpenMetadata/issues/13293

We currently sync dbt tags into OM by creating a new classification DBTTags and adding all that info as tags inside. What we need to figure out here is a way to directly link dbt tags into existing tags/tiers/glossaries in OpenMetadata. Example:

tables:
- name: DATA_TABLE
  description: Data ,
  columns:
  - name: gross_revenue
    description: column description
    meta:
      openmetadata:
      # DO NOT create anything new in OM, just link to existing items
        - type: GlossaryTerm
          name: BusinessGlossary.GrossRevenue
        - type: Classification
          name: Tier.Tier1

[1.4.1] - dbt run details

  • https://github.com/open-metadata/OpenMetadata/issues/14065

We need to figure out how to link datamodels with relevant information like:

  • past executions & status
  • last time it was refreshed

We have 2 different topics here:

  • How to integrate dbt cloud metadata (schedule, runs, jobs, etc.) -> This can become a new Pipeline Service
  • How to figure out - when using dbt core - when each model was refreshed etc. (https://docs.getdbt.com/reference/artifacts/run-results-json) -> These are extra properties to add to a DataModel Entity, be it a dbt model, or DashboardDataModel: add lastRefreshed, executions (similar to Pipeline Entity executions)

Create this as a Pipeline, show status, link the last status in the table, use Incident manager to track these pipeline status

[1.5] - Semantic Layer

  • https://github.com/open-metadata/OpenMetadata/issues/12911

How to integrate GENERALLY "Semantic Layer" data, be it from dbt metrics/exposures, Tableau Metrics, etc.

  • Would it make sense to add a "Metric" field to GlossaryTerm Entity that can be an SQL expression, Python code that computes the metrics, etc.

[1.5] - dbt Hooks

  • Can we build a GitHub action that validates changes in dbt vs. metadata in OM?

[1.3.2] - Documentation

  • https://github.com/open-metadata/OpenMetadata/issues/7113
  • [x] remove https://docs.open-metadata.org/v1.3.x/sdk/python/ingestion/dbt
  • [x] Follow the approach as with any connector (UI vs. run externally with new step-wise components)

Backlog

  • [ ] P2 - For dbt cloud, the metadata required can be gathered using a metadata API from dbt cloud. Check if it is feasible to implement it instead of current approach of getting dbt artifacts.
  • [ ] P2 https://github.com/open-metadata/OpenMetadata/issues/12910
  • [ ] P2 https://github.com/open-metadata/OpenMetadata/issues/12095
  • [ ] P1 https://github.com/open-metadata/OpenMetadata/issues/16050
  • [ ] (Collate only) Reverse metadata: Open a PR to the repo of the dbt project to update descriptions based on OM descriptions

OnkarVO7 avatar Feb 08 '24 07:02 OnkarVO7