datahub icon indicating copy to clipboard operation
datahub copied to clipboard

feat(ingestion/dbt): add support for ingesting DBT contracts as Data Contracts

Open ppiont opened this issue 1 month ago • 5 comments

Summary

Add support for ingesting DBT model contracts (introduced in dbt 1.5) as DataHub Data Contract entities with schema and data quality assertions.

Closes #11927

What problem does this solve?

DBT introduced model contracts in v1.5, allowing teams to define and enforce schema guarantees on models. However, DataHub had no way to ingest this contract metadata - it was lost during ingestion, and users couldn't see which models had contractual guarantees.

This PR bridges DBT's data governance (contracts) with DataHub's data governance (Data Contracts, Assertions), giving teams a unified view of their data quality guarantees.

What changes are being made?

New Configuration Options

Option Default Description
ingest_contracts false Enable Data Contract creation from DBT contracts
contract_test_tag "contract" Tag for tests to include in contract
ingest_column_constraints_as_assertions true Create assertions from not_null, unique, primary_key constraints

New Data Structures

  • DBTContract dataclass - captures enforced, alias_types, checksum from manifest
  • DBTConstraint dataclass - captures column/model-level constraints (not_null, unique, primary_key, etc.)

Contract Ingestion Flow

When ingest_contracts: true and a model has contract.enforced: true:

  1. Schema Assertion - Created from contracted model columns with exact match compatibility
  2. Constraint Assertions (optional) - Created for not_null, unique, primary_key constraints
  3. Tagged Test Assertions - Existing DBT tests tagged with contract_test_tag are linked
  4. Data Contract Entity - Bundles all assertions into a DataContractPropertiesClass

Platform Support

Platform Contract Support Notes
dbt Core Full Extracts from manifest.json
dbt Cloud Best-effort Reads from meta.contract or meta.datahub_contract (API doesn't expose contracts directly)

How was this tested?

  • Added unit tests for new dataclasses (DBTContract, DBTConstraint)
  • Added unit tests for configuration options
  • Added integration test for contract extraction from manifest
  • Tested locally with sample manifests containing contract.enforced: true

Checklist

  • [x] The PR conforms to DataHub's Contributing Guideline
  • [x] Links to related issues (Closes #11927)
  • [x] Tests for the changes have been added
  • [x] Docs have been added/updated (if needed - docs PR can follow)

Screenshots/Demo

After ingestion with ingest_contracts: true, models with contract.enforced: true will have:

  • A Data Contract entity linked to the dataset
  • Schema assertions validating the contracted columns
  • (Optionally) Constraint assertions for not_null/unique/primary_key

ppiont avatar Nov 26 '25 20:11 ppiont

Bundle Report

Changes will decrease total bundle size by 14.13kB (-0.05%) :arrow_down:. This is within the configured threshold :white_check_mark:

Detailed changes
Bundle name Size Change
datahub-react-web-esm 28.7MB -14.13kB (-0.05%) :arrow_down:

Affected Assets, Files, and Routes:

view changes for bundle: datahub-react-web-esm

Assets Changed:

Asset Name Size Change Total Size Change (%)
assets/index-*.js -14.13kB 19.08MB -0.07%

codecov[bot] avatar Nov 26 '25 21:11 codecov[bot]

Codecov Report

:white_check_mark: All modified and coverable lines are covered by tests.

:loudspeaker: Thoughts on this report? Let us know!

codecov[bot] avatar Nov 26 '25 22:11 codecov[bot]

@ppiont Thanks for the contribution. Will take a look at this

deepgarg760 avatar Dec 05 '25 09:12 deepgarg760

@ppiont , Please take care of prettier and linting errors

deepgarg760 avatar Dec 05 '25 09:12 deepgarg760

@ppiont , any benchmark has been done regarding this ? like ingesting large number of DBT contracts

deepgarg760 avatar Dec 05 '25 11:12 deepgarg760