sqlmesh icon indicating copy to clipboard operation
sqlmesh copied to clipboard

In a multi repo the create_external_models command has problems with models being defined in one project and selected from in another project

Open blecourt-private opened this issue 4 months ago • 1 comments

Problem statement

In this multi repo setup the model silver.c from repo_2 selects from the model bronze.a which is defined in repo_1.

It seems that sqlmesh create_external_models treats bronze.a as an external model for repo_2 and fails to discover that the model is in fact defined in repo_1.

SQLMesh version: 0.216.0

Reproducible example

Using forementioned multi repo setup:

cd  sqlmesh/examples/multi

Trying to create external models:

sqlmesh -p repo_1 -p repo_2 create_external_models

Output:

Initializing new project state...
[WARNING] Unable to get schema for '"db"."bronze"."a"': 'Catalog Error: Table with name a does not exist!
Did you mean "information_schema.tables"?

LINE 1: DESCRIBE "db"."bronze"."a"

Despite including both projects SQLMesh doesn't infer that bronze.a is defined within the multi-repo.

Planning first:

sqlmesh -p repo_1 -p repo_2 plan

Output:

[WARNING] Linter warnings for /home/statsdb/projects/sqlmesh/examples/multi/repo_1/models/b.sql:
 - nomissingdescription: All models should be documented.
 - Line 1: nomissingaudits - Model `audits` must be configured to test data quality.
 - Line 7: noselectstar - Query should not contain SELECT * on its outer most projections, even if it can be expanded.
[WARNING] Linter warnings for /home/statsdb/projects/sqlmesh/examples/multi/repo_1/models/a.sql:
 - nomissingdescription: All models should be documented.
 - Line 1: nomissingaudits - Model `audits` must be configured to test data quality.

`prod` environment will be initialized

Environment statements:

before_all:                                                                                                                                                                                   
  + CREATE TABLE IF NOT EXISTS before_1 AS select @one()                                                                                                                                      
  + CREATE TABLE IF NOT EXISTS before_2 AS select @two()                                                                                                                                      
                                                                                                                                                                                              
after_all:                                                                                                                                                                                    
  + CREATE TABLE IF NOT EXISTS after_1 AS select @dup()                                                                                                                                       
  + CREATE TABLE IF NOT EXISTS after_2 AS select @dup()                                                                                                                                       
                                                                                                                                                                                              
Models:
??? Added:
    ??? bronze.a
    ??? bronze.b
    ??? silver.c
    ??? silver.d
    ??? silver.e
Models needing backfill:
??? bronze.a: [full refresh]
??? bronze.b: [full refresh]
??? silver.c: [full refresh]
??? silver.d: [full refresh]
??? silver.e: [full refresh]
Apply - Backfill Tables [y/n]: y
[1/1] bronze.a   [full refresh]                            0.10s   
[1/1] bronze.b   [full refresh]                            0.10s   
[1/1] silver.c   [full refresh]                            0.10s   
[1/1] silver.e   [full refresh]                            0.11s   
[1/1] silver.d   [full refresh]                            0.10s   
Executing model batches ???????????????????????????????????????? 100.0% ? 5/5 ? 0:00:00                                                                                                       
                                                                                                                                                                                              
? Model batches executed

Updating virtual layer  ???????????????????????????????????????? 100.0% ? 5/5 ? 0:00:00

? Virtual layer updated

Now creating external models:

sqlmesh -p repo_1 -p repo_2 create_external_models

Output:

[WARNING] The following models already exist and can't be converted to external: "db"."bronze"."a". Perhaps these models have been removed, while downstream models that reference them 
weren't updated accordingly.

Again - despite including both projects SQLMesh doesn't infer that bronze.a is defined within the multi-repo.

Creating external models for repo_2 only:

sqlmesh -p repo_2 create_external_models

There is no terminal output. The file external_models.yaml just contains []. So in this case SQLMesh correctly decides that bronze.a is not an external model, presumably because bronze.a is present in the state of the prod environment - but shouldn't the command that includes both projects be able to do the same?

blecourt-private avatar Sep 09 '25 08:09 blecourt-private

@crericha

blecourt-private avatar Sep 09 '25 09:09 blecourt-private