feat(ingestion/dbt): add support for ingesting DBT contracts as Data Contracts
Summary
Add support for ingesting DBT model contracts (introduced in dbt 1.5) as DataHub Data Contract entities with schema and data quality assertions.
Closes #11927
What problem does this solve?
DBT introduced model contracts in v1.5, allowing teams to define and enforce schema guarantees on models. However, DataHub had no way to ingest this contract metadata - it was lost during ingestion, and users couldn't see which models had contractual guarantees.
This PR bridges DBT's data governance (contracts) with DataHub's data governance (Data Contracts, Assertions), giving teams a unified view of their data quality guarantees.
What changes are being made?
New Configuration Options
| Option | Default | Description |
|---|---|---|
ingest_contracts |
false |
Enable Data Contract creation from DBT contracts |
contract_test_tag |
"contract" |
Tag for tests to include in contract |
ingest_column_constraints_as_assertions |
true |
Create assertions from not_null, unique, primary_key constraints |
New Data Structures
DBTContractdataclass - capturesenforced,alias_types,checksumfrom manifestDBTConstraintdataclass - captures column/model-level constraints (not_null,unique,primary_key, etc.)
Contract Ingestion Flow
When ingest_contracts: true and a model has contract.enforced: true:
- Schema Assertion - Created from contracted model columns with exact match compatibility
- Constraint Assertions (optional) - Created for
not_null,unique,primary_keyconstraints - Tagged Test Assertions - Existing DBT tests tagged with
contract_test_tagare linked - Data Contract Entity - Bundles all assertions into a
DataContractPropertiesClass
Platform Support
| Platform | Contract Support | Notes |
|---|---|---|
| dbt Core | Full | Extracts from manifest.json |
| dbt Cloud | Best-effort | Reads from meta.contract or meta.datahub_contract (API doesn't expose contracts directly) |
How was this tested?
- Added unit tests for new dataclasses (
DBTContract,DBTConstraint) - Added unit tests for configuration options
- Added integration test for contract extraction from manifest
- Tested locally with sample manifests containing
contract.enforced: true
Checklist
- [x] The PR conforms to DataHub's Contributing Guideline
- [x] Links to related issues (Closes #11927)
- [x] Tests for the changes have been added
- [x] Docs have been added/updated (if needed - docs PR can follow)
Screenshots/Demo
After ingestion with ingest_contracts: true, models with contract.enforced: true will have:
- A Data Contract entity linked to the dataset
- Schema assertions validating the contracted columns
- (Optionally) Constraint assertions for
not_null/unique/primary_key
Bundle Report
Changes will decrease total bundle size by 14.13kB (-0.05%) :arrow_down:. This is within the configured threshold :white_check_mark:
Detailed changes
| Bundle name | Size | Change |
|---|---|---|
| datahub-react-web-esm | 28.7MB | -14.13kB (-0.05%) :arrow_down: |
Affected Assets, Files, and Routes:
view changes for bundle: datahub-react-web-esm
Assets Changed:
| Asset Name | Size Change | Total Size | Change (%) |
|---|---|---|---|
assets/index-*.js |
-14.13kB | 19.08MB | -0.07% |
Codecov Report
:white_check_mark: All modified and coverable lines are covered by tests.
:loudspeaker: Thoughts on this report? Let us know!
@ppiont Thanks for the contribution. Will take a look at this
@ppiont , Please take care of prettier and linting errors
@ppiont , any benchmark has been done regarding this ? like ingesting large number of DBT contracts