feat(wren-launcher): Expanded dbt<>Wren MDL conversions, added BigQuery support
This PR directly builds off #1827 to provide key updates and expansions around parsing dbt information into Wren MDL, as well as establishing bigquery-dbt support.
Initial Features & BigQuery Support
1. BigQuery Integration
- Added full support for converting dbt projects that use a BigQuery data source.
- The converter correctly parses
profiles.ymlfor BigQuery-specific connection details, includingproject,dataset,keyfile, andmethod. - Includes robust validation for BigQuery connections, ensuring required properties are present and supported authentication methods (
service-account,service-account-json) are correctly configured.oauth-related methods are screened out with the assumption live testing may come in a later iteration.
2. Metadata Mapping
- Reads a column's
meta.labelfield from the dbt project and maps it to thedisplayNameproperty in the Wren MDL.
3. Configurable Staging Model Inclusion
- Added an
--include-staging-modelscommand-line flag to thedbt-auto-convertcommand. - When this flag is used, the converter will include models with
stg_orstaging_in their names; otherwise, they are skipped by default.
Semantic Layer & Data Integrity Features
1. Relationship Generation from dbt Tests
- The converter now automatically generates Wren
Relationshipobjects from dbtrelationshipstests. - Robust Parsing: The logic correctly parses relationship tests from both possible locations in the
manifest.json: embedded directly onstructfields and as top-level, compiledtestnodes for simple columns.
2. Metric Conversion from dbt Semantic Layer
- Added support for parsing
semantic_manifest.jsonto translate dbtmetricsand their underlyingsemantic_modelsinto WrenMetricobjects. - Correctly handles
simple,ratio, andderivedmetric types. - This feature is optional; the script runs without error if
semantic_manifest.jsonis not found.
3. Enum Definition Generation from accepted_values Tests
- Generates Wren
EnumDefinitionobjects from dbtaccepted_valuestests. - De-duplication Logic: If multiple columns share the exact same set of accepted values, only one
EnumDefinitionis created, and all relevant columns are linked to it. - Supports tests on both simple columns and nested
structfields.
4. Primary Key Identification from Semantic Models
- Reads the
entitieslist within dbtsemantic_modelsto identify the primary entity (type: "primary"). - Uses the
node_relation.aliasto correctly map the identified primary key to the corresponding dbt model and populate theprimaryKeyfield in the Wrenmodel.
5. Not Null Constraint Conversion
- Identifies dbt
not_nulltests from all possible locations in the manifest. - When a
not_nulltest is found, theNotNullfield on the corresponding WrenWrenColumnis set totrue.
Miscellaneous notes
Made Semantic Layer Parsing Optional & Robust
- The logic for parsing
semantic_manifest.jsonwas wrapped in checks to ensure that the converter runs successfully without errors even if the file is missing or malformed, simply skipping the dependent features.
Corrected Struct Definitions
- The
wren_mdl.gofile was updated to include the necessary structs and fields forMetricandEnumDefinitionto support the new features.
Summary by CodeRabbit
-
New Features
- BigQuery is supported as a data source with credential handling and validation.
- dbt conversion can read semantic manifests and now exports metrics, enums, primary keys, inferred relationships, and column display names into the MDL output.
- New CLI flag and interactive prompt to include or exclude staging models during conversion.
-
Tests
- Added tests covering BigQuery data source creation, credential resolution, validation, and type mappings.
-
Chores
- Minor formatting and consistency tweaks.
Walkthrough
Adds an --include-staging-models flag and interactive prompt; threads IncludeStagingModels through CLI → DbtConvertProject; extends converter to parse optional semantic_manifest.json (enums, metrics, primary keys, relationships); adds BigQuery data source support and tests; extends Wren MDL with EnumDefinitions and Metrics. Duplicate prompt helper present.
Changes
| Cohort / File(s) | Change Summary |
|---|---|
CLI & Call Siteswren-launcher/commands/dbt.go, wren-launcher/commands/launch.go |
Added --include-staging-models flag and an interactive prompt; passed IncludeStagingModels into DbtConvertProject; duplicate askForIncludeStagingModels helper added; updated call sites to include the new boolean parameter. |
DBT Conversion Corewren-launcher/commands/dbt/converter.go |
Added semantic_manifest.json handling and staging-aware conversion; ConvertDbtCatalogToWrenMDL and helpers accept semantic manifest path and includeStagingModels; added enum extraction, not-null/primary-key mappings, relationship generation, metric conversion, and conversion counts; added IncludeStagingModels to ConvertOptions. |
Data Source & Profileswren-launcher/commands/dbt/data_source.go, wren-launcher/commands/dbt/profiles.go, wren-launcher/commands/dbt/profiles_analyzer.go |
Added WrenBigQueryDataSource and convertToBigQueryDataSource with credential handling (service-account-json, keyfile resolution, oauth warning); convertConnectionToDataSource handles "bigquery"; added DbtConnection.Method field and parse support; removed MySQL type; changed WrenPostgresDataSource.Port from string to int (default 5432). |
Wren MDL Schemawren-launcher/commands/dbt/wren_mdl.go |
Added EnumDefinition and Metric structs; added EnumDefinitions and Metrics to WrenMDLManifest; added DisplayName and Enum fields to WrenColumn. |
Testswren-launcher/commands/dbt/data_source_test.go |
Added BigQuery tests (service-account-json, absolute/relative keyfile paths, validation), MapType tests; updated Postgres tests to reflect numeric default port and Database field name; removed legacy validator scaffolding. |
Sequence Diagram(s)
sequenceDiagram
autonumber
participant User as CLI User
participant CLI as DbtAutoConvert (CLI)
participant Conv as DbtConvertProject / Converter
participant DS as DataSource Builder
participant MDL as WrenMDLManifest
User->>CLI: run convert (flag or prompt)
CLI->>Conv: invoke conversion (IncludeStagingModels)
Conv->>DS: parse profiles → build DataSource (BigQuery path if detected)
Conv->>Conv: read manifest.json & catalog.json
alt semantic_manifest present
Conv->>Conv: read semantic_manifest.json → extract enums, metrics, PKs
end
Conv->>Conv: apply staging filter (based on IncludeStagingModels)
Conv->>Conv: generate relationships & metrics
Conv->>MDL: assemble manifest (models, enums, metrics, relationships, datasources)
MDL-->>CLI: return ConvertResult
CLI-->>User: write/output MDL
Estimated code review effort
🎯 4 (Complex) | ⏱️ ~45 minutes
Possibly related PRs
- Canner/WrenAI#1827 — Overlaps with dbt conversion flow and data-source conversion (semantic-manifest handling, ConvertOptions/DbtConvertProject edits).
Suggested reviewers
- douenergy
- wwwy3y3
Poem
"I nibble manifests and chase each key,
BigQuery crumbs and enums follow me,
Metrics hum softly, relationships sing,
Staging gates open — conversion takes wing,
Hop, hop, the MDL blossoms for me. 🐇"
Pre-merge checks and finishing touches
❌ Failed checks (1 warning)
| Check name | Status | Explanation | Resolution |
|---|---|---|---|
| Docstring Coverage | ⚠️ Warning | Docstring coverage is 72.97% which is insufficient. The required threshold is 80.00%. | You can run @coderabbitai generate docstrings to improve docstring coverage. |
✅ Passed checks (2 passed)
| Check name | Status | Explanation |
|---|---|---|
| Description Check | ✅ Passed | Check skipped - CodeRabbit’s high-level summary is enabled. |
| Title Check | ✅ Passed | The title succinctly and accurately summarizes the primary changes in the PR: expanded dbt<>Wren MDL conversion capabilities and added BigQuery support, which align with the modified files and stated objectives. It is concise, specific, and meaningful for a teammate scanning the commit history. |
✨ Finishing touches
- [ ] 📝 Generate Docstrings
🧪 Generate unit tests
- [ ] Create PR with unit tests
- [ ] Post copyable unit tests in a comment
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.
Comment @coderabbitai help to get the list of available commands and usage tips.
Thanks @cougrimes for the contribution. I'll take a look 👍
Hi @cougrimes, after https://github.com/Canner/WrenAI/pull/1877, we added some formatter and lint tools for wren-launcher module. It causes some conflicts for this PR. If you don't mind, I can help to solve the conflict by pushing to your branch directly.
By the way, I left some comments for this PR. If you have no time to improve them, we can merge this PR first and I'll improve them in the follow-up PR.
Anyway, thanks for your contribution.
@goldmedal Happy to make the changes requested, but I actually wound up hitting an unexpected issue when processing the MDL on my end—despite declaring BigQuery, I keep getting the MDL trying to parse as Trino rather than BQ. Do you have any other documentation on the MDL schema? I've been trying to work backward to figure out why this is happening to no avail.
@goldmedal Happy to make the changes requested, but I actually wound up hitting an unexpected issue when processing the MDL on my end—despite declaring BigQuery, I keep getting the MDL trying to parse as Trino rather than BQ.
What did you mean that the MDL is parsed as Trino? Which step will do it? 🤔
Do you have any other documentation on the MDL schema? I've been trying to work backward to figure out why this is happening to no avail.
You can check the doc for WrenMDL or the json schema. Although some features (e.g. Metric, Enum, ..) are not presented in the doc, we still can put them in the MDL for SQL-generated context.
Most of the issues I had been running into around Trino-esque errors are addressed in Canner/wren-engine#1290; issues with keys and auths resolved.
hi @cougrimes There are some lint checking and format checking failures. Could you check them? You can use the following command on the local for checking them.
make check
Hi @cougrimes, are you still working on this? I’d be happy to help resolve the conflicts and get all the tests passing.
@cougrimes I'm sorry to overwrite your branch with Wren AI's main branch. I want to clear the changes, but I force-push the main branch to your branch. It causes that this PR is closed automatically. And I don't have permission to recover it by pushing the correct commits again. I created another PR #1965, which is based on the latest main and cherry-picks your commits. Sorry again, and thanks for your contribution. We will merge the change as soon as possible.