In a multi repo the create_external_models command has problems with models being defined in one project and selected from in another project
Problem statement
In this multi repo setup the model silver.c from repo_2 selects from the model bronze.a which is defined in repo_1.
It seems that sqlmesh create_external_models treats bronze.a as an external model for repo_2 and fails to discover that the model is in fact defined in repo_1.
SQLMesh version: 0.216.0
Reproducible example
Using forementioned multi repo setup:
cd sqlmesh/examples/multi
Trying to create external models:
sqlmesh -p repo_1 -p repo_2 create_external_models
Output:
Initializing new project state...
[WARNING] Unable to get schema for '"db"."bronze"."a"': 'Catalog Error: Table with name a does not exist!
Did you mean "information_schema.tables"?
LINE 1: DESCRIBE "db"."bronze"."a"
Despite including both projects SQLMesh doesn't infer that bronze.a is defined within the multi-repo.
Planning first:
sqlmesh -p repo_1 -p repo_2 plan
Output:
[WARNING] Linter warnings for /home/statsdb/projects/sqlmesh/examples/multi/repo_1/models/b.sql:
- nomissingdescription: All models should be documented.
- Line 1: nomissingaudits - Model `audits` must be configured to test data quality.
- Line 7: noselectstar - Query should not contain SELECT * on its outer most projections, even if it can be expanded.
[WARNING] Linter warnings for /home/statsdb/projects/sqlmesh/examples/multi/repo_1/models/a.sql:
- nomissingdescription: All models should be documented.
- Line 1: nomissingaudits - Model `audits` must be configured to test data quality.
`prod` environment will be initialized
Environment statements:
before_all:
+ CREATE TABLE IF NOT EXISTS before_1 AS select @one()
+ CREATE TABLE IF NOT EXISTS before_2 AS select @two()
after_all:
+ CREATE TABLE IF NOT EXISTS after_1 AS select @dup()
+ CREATE TABLE IF NOT EXISTS after_2 AS select @dup()
Models:
??? Added:
??? bronze.a
??? bronze.b
??? silver.c
??? silver.d
??? silver.e
Models needing backfill:
??? bronze.a: [full refresh]
??? bronze.b: [full refresh]
??? silver.c: [full refresh]
??? silver.d: [full refresh]
??? silver.e: [full refresh]
Apply - Backfill Tables [y/n]: y
[1/1] bronze.a [full refresh] 0.10s
[1/1] bronze.b [full refresh] 0.10s
[1/1] silver.c [full refresh] 0.10s
[1/1] silver.e [full refresh] 0.11s
[1/1] silver.d [full refresh] 0.10s
Executing model batches ???????????????????????????????????????? 100.0% ? 5/5 ? 0:00:00
? Model batches executed
Updating virtual layer ???????????????????????????????????????? 100.0% ? 5/5 ? 0:00:00
? Virtual layer updated
Now creating external models:
sqlmesh -p repo_1 -p repo_2 create_external_models
Output:
[WARNING] The following models already exist and can't be converted to external: "db"."bronze"."a". Perhaps these models have been removed, while downstream models that reference them
weren't updated accordingly.
Again - despite including both projects SQLMesh doesn't infer that bronze.a is defined within the multi-repo.
Creating external models for repo_2 only:
sqlmesh -p repo_2 create_external_models
There is no terminal output. The file external_models.yaml just contains []. So in this case SQLMesh correctly decides that bronze.a is not an external model, presumably because bronze.a is present in the state of the prod environment - but shouldn't the command that includes both projects be able to do the same?
@crericha