WrenAI feat(wren-launcher): Expanded dbt<>Wren MDL conversions, added BigQuery support

This PR directly builds off #1827 to provide key updates and expansions around parsing dbt information into Wren MDL, as well as establishing bigquery-dbt support.

Initial Features & BigQuery Support

1. BigQuery Integration

Added full support for converting dbt projects that use a BigQuery data source.
The converter correctly parses profiles.yml for BigQuery-specific connection details, including project, dataset, keyfile, and method.
Includes robust validation for BigQuery connections, ensuring required properties are present and supported authentication methods (service-account, service-account-json) are correctly configured. oauth-related methods are screened out with the assumption live testing may come in a later iteration.

2. Metadata Mapping

Reads a column's meta.label field from the dbt project and maps it to the displayName property in the Wren MDL.

3. Configurable Staging Model Inclusion

Added an --include-staging-models command-line flag to the dbt-auto-convert command.
When this flag is used, the converter will include models with stg_ or staging_ in their names; otherwise, they are skipped by default.

Semantic Layer & Data Integrity Features

1. Relationship Generation from dbt Tests

The converter now automatically generates Wren Relationship objects from dbt relationships tests.
Robust Parsing: The logic correctly parses relationship tests from both possible locations in the manifest.json: embedded directly on struct fields and as top-level, compiled test nodes for simple columns.

2. Metric Conversion from dbt Semantic Layer

Added support for parsing semantic_manifest.json to translate dbt metrics and their underlying semantic_models into Wren Metric objects.
Correctly handles simple, ratio, and derived metric types.
This feature is optional; the script runs without error if semantic_manifest.json is not found.

3. Enum Definition Generation from `accepted_values` Tests

Generates Wren EnumDefinition objects from dbt accepted_values tests.
De-duplication Logic: If multiple columns share the exact same set of accepted values, only one EnumDefinition is created, and all relevant columns are linked to it.
Supports tests on both simple columns and nested struct fields.

4. Primary Key Identification from Semantic Models

Reads the entities list within dbt semantic_models to identify the primary entity (type: "primary").
Uses the node_relation.alias to correctly map the identified primary key to the corresponding dbt model and populate the primaryKey field in the Wren model.

5. Not Null Constraint Conversion

Identifies dbt not_null tests from all possible locations in the manifest.
When a not_null test is found, the NotNull field on the corresponding Wren WrenColumn is set to true.

Miscellaneous notes

Made Semantic Layer Parsing Optional & Robust

The logic for parsing semantic_manifest.json was wrapped in checks to ensure that the converter runs successfully without errors even if the file is missing or malformed, simply skipping the dependent features.

Corrected Struct Definitions

The wren_mdl.go file was updated to include the necessary structs and fields for Metric and EnumDefinition to support the new features.

Summary by CodeRabbit

New Features
- BigQuery is supported as a data source with credential handling and validation.
- dbt conversion can read semantic manifests and now exports metrics, enums, primary keys, inferred relationships, and column display names into the MDL output.
- New CLI flag and interactive prompt to include or exclude staging models during conversion.
Tests
- Added tests covering BigQuery data source creation, credential resolution, validation, and type mappings.
Chores
- Minor formatting and consistency tweaks.

Aug 03 '25 21:08 cougrimes

Walkthrough

Adds an --include-staging-models flag and interactive prompt; threads IncludeStagingModels through CLI → DbtConvertProject; extends converter to parse optional semantic_manifest.json (enums, metrics, primary keys, relationships); adds BigQuery data source support and tests; extends Wren MDL with EnumDefinitions and Metrics. Duplicate prompt helper present.

Changes

Cohort / File(s)	Change Summary
CLI & Call Sites `wren-launcher/commands/dbt.go`, `wren-launcher/commands/launch.go`	Added `--include-staging-models` flag and an interactive prompt; passed IncludeStagingModels into `DbtConvertProject`; duplicate `askForIncludeStagingModels` helper added; updated call sites to include the new boolean parameter.
DBT Conversion Core `wren-launcher/commands/dbt/converter.go`	Added semantic_manifest.json handling and staging-aware conversion; `ConvertDbtCatalogToWrenMDL` and helpers accept semantic manifest path and includeStagingModels; added enum extraction, not-null/primary-key mappings, relationship generation, metric conversion, and conversion counts; added `IncludeStagingModels` to `ConvertOptions`.
Data Source & Profiles `wren-launcher/commands/dbt/data_source.go`, `wren-launcher/commands/dbt/profiles.go`, `wren-launcher/commands/dbt/profiles_analyzer.go`	Added `WrenBigQueryDataSource` and `convertToBigQueryDataSource` with credential handling (service-account-json, keyfile resolution, oauth warning); `convertConnectionToDataSource` handles "bigquery"; added `DbtConnection.Method` field and parse support; removed MySQL type; changed `WrenPostgresDataSource.Port` from string to int (default 5432).
Wren MDL Schema `wren-launcher/commands/dbt/wren_mdl.go`	Added `EnumDefinition` and `Metric` structs; added `EnumDefinitions` and `Metrics` to `WrenMDLManifest`; added `DisplayName` and `Enum` fields to `WrenColumn`.
Tests `wren-launcher/commands/dbt/data_source_test.go`	Added BigQuery tests (service-account-json, absolute/relative keyfile paths, validation), MapType tests; updated Postgres tests to reflect numeric default port and `Database` field name; removed legacy validator scaffolding.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant User as CLI User
    participant CLI as DbtAutoConvert (CLI)
    participant Conv as DbtConvertProject / Converter
    participant DS as DataSource Builder
    participant MDL as WrenMDLManifest

    User->>CLI: run convert (flag or prompt)
    CLI->>Conv: invoke conversion (IncludeStagingModels)
    Conv->>DS: parse profiles → build DataSource (BigQuery path if detected)
    Conv->>Conv: read manifest.json & catalog.json
    alt semantic_manifest present
        Conv->>Conv: read semantic_manifest.json → extract enums, metrics, PKs
    end
    Conv->>Conv: apply staging filter (based on IncludeStagingModels)
    Conv->>Conv: generate relationships & metrics
    Conv->>MDL: assemble manifest (models, enums, metrics, relationships, datasources)
    MDL-->>CLI: return ConvertResult
    CLI-->>User: write/output MDL

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Canner/WrenAI#1827 — Overlaps with dbt conversion flow and data-source conversion (semantic-manifest handling, ConvertOptions/DbtConvertProject edits).

Suggested reviewers

douenergy
wwwy3y3

Poem

"I nibble manifests and chase each key,
BigQuery crumbs and enums follow me,
Metrics hum softly, relationships sing,
Staging gates open — conversion takes wing,
Hop, hop, the MDL blossoms for me. 🐇"

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 72.97% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The title succinctly and accurately summarizes the primary changes in the PR: expanded dbt<>Wren MDL conversion capabilities and added BigQuery support, which align with the modified files and stated objectives. It is concise, specific, and meaningful for a teammate scanning the commit history.

✨ Finishing touches

[ ] 📝 Generate Docstrings

🧪 Generate unit tests

[ ] Create PR with unit tests
[ ] Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Aug 03 '25 21:08 coderabbitai[bot]

Thanks @cougrimes for the contribution. I'll take a look 👍

Aug 04 '25 07:08 goldmedal

Hi @cougrimes, after https://github.com/Canner/WrenAI/pull/1877, we added some formatter and lint tools for wren-launcher module. It causes some conflicts for this PR. If you don't mind, I can help to solve the conflict by pushing to your branch directly. By the way, I left some comments for this PR. If you have no time to improve them, we can merge this PR first and I'll improve them in the follow-up PR. Anyway, thanks for your contribution.

Aug 12 '25 08:08 goldmedal

@goldmedal Happy to make the changes requested, but I actually wound up hitting an unexpected issue when processing the MDL on my end—despite declaring BigQuery, I keep getting the MDL trying to parse as Trino rather than BQ. Do you have any other documentation on the MDL schema? I've been trying to work backward to figure out why this is happening to no avail.

Aug 13 '25 21:08 cougrimes

@goldmedal Happy to make the changes requested, but I actually wound up hitting an unexpected issue when processing the MDL on my end—despite declaring BigQuery, I keep getting the MDL trying to parse as Trino rather than BQ.

What did you mean that the MDL is parsed as Trino? Which step will do it? 🤔

Do you have any other documentation on the MDL schema? I've been trying to work backward to figure out why this is happening to no avail.

You can check the doc for WrenMDL or the json schema. Although some features (e.g. Metric, Enum, ..) are not presented in the doc, we still can put them in the MDL for SQL-generated context.

Aug 14 '25 06:08 goldmedal

Most of the issues I had been running into around Trino-esque errors are addressed in Canner/wren-engine#1290; issues with keys and auths resolved.

Aug 18 '25 21:08 cougrimes

hi @cougrimes There are some lint checking and format checking failures. Could you check them? You can use the following command on the local for checking them.

make check

Aug 21 '25 03:08 goldmedal

Hi @cougrimes, are you still working on this? I’d be happy to help resolve the conflicts and get all the tests passing.

Sep 11 '25 07:09 douenergy

@cougrimes I'm sorry to overwrite your branch with Wren AI's main branch. I want to clear the changes, but I force-push the main branch to your branch. It causes that this PR is closed automatically. And I don't have permission to recover it by pushing the correct commits again. I created another PR #1965, which is based on the latest main and cherry-picks your commits. Sorry again, and thanks for your contribution. We will merge the change as soon as possible.

Sep 22 '25 04:09 goldmedal

feat(wren-launcher): Expanded dbt<>Wren MDL conversions, added BigQuery support

Initial Features & BigQuery Support

1. BigQuery Integration

2. Metadata Mapping

3. Configurable Staging Model Inclusion

Semantic Layer & Data Integrity Features

1. Relationship Generation from dbt Tests

2. Metric Conversion from dbt Semantic Layer

3. Enum Definition Generation from accepted_values Tests

4. Primary Key Identification from Semantic Models

5. Not Null Constraint Conversion

Miscellaneous notes

Made Semantic Layer Parsing Optional & Robust

Corrected Struct Definitions

Summary by CodeRabbit

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Pre-merge checks and finishing touches

3. Enum Definition Generation from `accepted_values` Tests