WrenAI icon indicating copy to clipboard operation
WrenAI copied to clipboard

feat(wren-launcher): Expanded dbt<>Wren MDL conversions, added BigQuery support

Open cougrimes opened this issue 4 months ago • 7 comments

This PR directly builds off #1827 to provide key updates and expansions around parsing dbt information into Wren MDL, as well as establishing bigquery-dbt support.

Initial Features & BigQuery Support

1. BigQuery Integration

  • Added full support for converting dbt projects that use a BigQuery data source.
  • The converter correctly parses profiles.yml for BigQuery-specific connection details, including project, dataset, keyfile, and method.
  • Includes robust validation for BigQuery connections, ensuring required properties are present and supported authentication methods (service-account, service-account-json) are correctly configured. oauth-related methods are screened out with the assumption live testing may come in a later iteration.

2. Metadata Mapping

  • Reads a column's meta.label field from the dbt project and maps it to the displayName property in the Wren MDL.

3. Configurable Staging Model Inclusion

  • Added an --include-staging-models command-line flag to the dbt-auto-convert command.
  • When this flag is used, the converter will include models with stg_ or staging_ in their names; otherwise, they are skipped by default.

Semantic Layer & Data Integrity Features

1. Relationship Generation from dbt Tests

  • The converter now automatically generates Wren Relationship objects from dbt relationships tests.
  • Robust Parsing: The logic correctly parses relationship tests from both possible locations in the manifest.json: embedded directly on struct fields and as top-level, compiled test nodes for simple columns.

2. Metric Conversion from dbt Semantic Layer

  • Added support for parsing semantic_manifest.json to translate dbt metrics and their underlying semantic_models into Wren Metric objects.
  • Correctly handles simple, ratio, and derived metric types.
  • This feature is optional; the script runs without error if semantic_manifest.json is not found.

3. Enum Definition Generation from accepted_values Tests

  • Generates Wren EnumDefinition objects from dbt accepted_values tests.
  • De-duplication Logic: If multiple columns share the exact same set of accepted values, only one EnumDefinition is created, and all relevant columns are linked to it.
  • Supports tests on both simple columns and nested struct fields.

4. Primary Key Identification from Semantic Models

  • Reads the entities list within dbt semantic_models to identify the primary entity (type: "primary").
  • Uses the node_relation.alias to correctly map the identified primary key to the corresponding dbt model and populate the primaryKey field in the Wren model.

5. Not Null Constraint Conversion

  • Identifies dbt not_null tests from all possible locations in the manifest.
  • When a not_null test is found, the NotNull field on the corresponding Wren WrenColumn is set to true.

Miscellaneous notes

Made Semantic Layer Parsing Optional & Robust

  • The logic for parsing semantic_manifest.json was wrapped in checks to ensure that the converter runs successfully without errors even if the file is missing or malformed, simply skipping the dependent features.

Corrected Struct Definitions

  • The wren_mdl.go file was updated to include the necessary structs and fields for Metric and EnumDefinition to support the new features.

Summary by CodeRabbit

  • New Features

    • BigQuery is supported as a data source with credential handling and validation.
    • dbt conversion can read semantic manifests and now exports metrics, enums, primary keys, inferred relationships, and column display names into the MDL output.
    • New CLI flag and interactive prompt to include or exclude staging models during conversion.
  • Tests

    • Added tests covering BigQuery data source creation, credential resolution, validation, and type mappings.
  • Chores

    • Minor formatting and consistency tweaks.

cougrimes avatar Aug 03 '25 21:08 cougrimes

Walkthrough

Adds an --include-staging-models flag and interactive prompt; threads IncludeStagingModels through CLI → DbtConvertProject; extends converter to parse optional semantic_manifest.json (enums, metrics, primary keys, relationships); adds BigQuery data source support and tests; extends Wren MDL with EnumDefinitions and Metrics. Duplicate prompt helper present.

Changes

Cohort / File(s) Change Summary
CLI & Call Sites
wren-launcher/commands/dbt.go, wren-launcher/commands/launch.go
Added --include-staging-models flag and an interactive prompt; passed IncludeStagingModels into DbtConvertProject; duplicate askForIncludeStagingModels helper added; updated call sites to include the new boolean parameter.
DBT Conversion Core
wren-launcher/commands/dbt/converter.go
Added semantic_manifest.json handling and staging-aware conversion; ConvertDbtCatalogToWrenMDL and helpers accept semantic manifest path and includeStagingModels; added enum extraction, not-null/primary-key mappings, relationship generation, metric conversion, and conversion counts; added IncludeStagingModels to ConvertOptions.
Data Source & Profiles
wren-launcher/commands/dbt/data_source.go, wren-launcher/commands/dbt/profiles.go, wren-launcher/commands/dbt/profiles_analyzer.go
Added WrenBigQueryDataSource and convertToBigQueryDataSource with credential handling (service-account-json, keyfile resolution, oauth warning); convertConnectionToDataSource handles "bigquery"; added DbtConnection.Method field and parse support; removed MySQL type; changed WrenPostgresDataSource.Port from string to int (default 5432).
Wren MDL Schema
wren-launcher/commands/dbt/wren_mdl.go
Added EnumDefinition and Metric structs; added EnumDefinitions and Metrics to WrenMDLManifest; added DisplayName and Enum fields to WrenColumn.
Tests
wren-launcher/commands/dbt/data_source_test.go
Added BigQuery tests (service-account-json, absolute/relative keyfile paths, validation), MapType tests; updated Postgres tests to reflect numeric default port and Database field name; removed legacy validator scaffolding.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant User as CLI User
    participant CLI as DbtAutoConvert (CLI)
    participant Conv as DbtConvertProject / Converter
    participant DS as DataSource Builder
    participant MDL as WrenMDLManifest

    User->>CLI: run convert (flag or prompt)
    CLI->>Conv: invoke conversion (IncludeStagingModels)
    Conv->>DS: parse profiles → build DataSource (BigQuery path if detected)
    Conv->>Conv: read manifest.json & catalog.json
    alt semantic_manifest present
        Conv->>Conv: read semantic_manifest.json → extract enums, metrics, PKs
    end
    Conv->>Conv: apply staging filter (based on IncludeStagingModels)
    Conv->>Conv: generate relationships & metrics
    Conv->>MDL: assemble manifest (models, enums, metrics, relationships, datasources)
    MDL-->>CLI: return ConvertResult
    CLI-->>User: write/output MDL

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • Canner/WrenAI#1827 — Overlaps with dbt conversion flow and data-source conversion (semantic-manifest handling, ConvertOptions/DbtConvertProject edits).

Suggested reviewers

  • douenergy
  • wwwy3y3

Poem

"I nibble manifests and chase each key,
BigQuery crumbs and enums follow me,
Metrics hum softly, relationships sing,
Staging gates open — conversion takes wing,
Hop, hop, the MDL blossoms for me. 🐇"

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 72.97% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title succinctly and accurately summarizes the primary changes in the PR: expanded dbt<>Wren MDL conversion capabilities and added BigQuery support, which align with the modified files and stated objectives. It is concise, specific, and meaningful for a teammate scanning the commit history.
✨ Finishing touches
  • [ ] 📝 Generate Docstrings
🧪 Generate unit tests
  • [ ] Create PR with unit tests
  • [ ] Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

coderabbitai[bot] avatar Aug 03 '25 21:08 coderabbitai[bot]

Thanks @cougrimes for the contribution. I'll take a look 👍

goldmedal avatar Aug 04 '25 07:08 goldmedal

Hi @cougrimes, after https://github.com/Canner/WrenAI/pull/1877, we added some formatter and lint tools for wren-launcher module. It causes some conflicts for this PR. If you don't mind, I can help to solve the conflict by pushing to your branch directly. By the way, I left some comments for this PR. If you have no time to improve them, we can merge this PR first and I'll improve them in the follow-up PR. Anyway, thanks for your contribution.

goldmedal avatar Aug 12 '25 08:08 goldmedal

@goldmedal Happy to make the changes requested, but I actually wound up hitting an unexpected issue when processing the MDL on my end—despite declaring BigQuery, I keep getting the MDL trying to parse as Trino rather than BQ. Do you have any other documentation on the MDL schema? I've been trying to work backward to figure out why this is happening to no avail.

cougrimes avatar Aug 13 '25 21:08 cougrimes

@goldmedal Happy to make the changes requested, but I actually wound up hitting an unexpected issue when processing the MDL on my end—despite declaring BigQuery, I keep getting the MDL trying to parse as Trino rather than BQ.

What did you mean that the MDL is parsed as Trino? Which step will do it? 🤔

Do you have any other documentation on the MDL schema? I've been trying to work backward to figure out why this is happening to no avail.

You can check the doc for WrenMDL or the json schema. Although some features (e.g. Metric, Enum, ..) are not presented in the doc, we still can put them in the MDL for SQL-generated context.

goldmedal avatar Aug 14 '25 06:08 goldmedal

Most of the issues I had been running into around Trino-esque errors are addressed in Canner/wren-engine#1290; issues with keys and auths resolved.

cougrimes avatar Aug 18 '25 21:08 cougrimes

hi @cougrimes There are some lint checking and format checking failures. Could you check them? You can use the following command on the local for checking them.

make check

goldmedal avatar Aug 21 '25 03:08 goldmedal

Hi @cougrimes, are you still working on this? I’d be happy to help resolve the conflicts and get all the tests passing.

douenergy avatar Sep 11 '25 07:09 douenergy

@cougrimes I'm sorry to overwrite your branch with Wren AI's main branch. I want to clear the changes, but I force-push the main branch to your branch. It causes that this PR is closed automatically. And I don't have permission to recover it by pushing the correct commits again. I created another PR #1965, which is based on the latest main and cherry-picks your commits. Sorry again, and thanks for your contribution. We will merge the change as soon as possible.

goldmedal avatar Sep 22 '25 04:09 goldmedal