schema_changes_from_baseline test does not detect extra columns on Databricks source table
Describe the bug
The elementary.schema_changes_from_baseline test applied to a Databricks source table does not detect undeclared columns and always returns a dummy select * from nothing query.
To Reproduce Declare a source using Unity Catalog with the following schema and test setup:
version: 2
sources:
- name: clearance
database: bronze
schema: googlesheets_clearance
tables:
- name: clearance
description: [..]
data_tests:
- unique_combination_of_columns:
combination_of_columns:
- _airbyte_generation_id
- _airbyte_raw_id
- reference
- import_timestamp
escape_nulls: true
- elementary.schema_changes_from_baseline:
fail_on_added: true
tags: ["elementary"]
config:
severity: warn
columns: [..]
The actual schema in Unity Catalog includes undeclared columns such as:
_airbyte_raw_id_airbyte_extracted_at_airbyte_generation_id_airbyte_metatest
Still, dbt test -s source:clearance passes:
1 of 2 PASS elementary_source_schema_changes_from_baseline_clearance_clearance_True
Looking at the compiled SQL:
with nothing as (select 1 as num)
select * from nothing where num = 2
Expected behavior The test should compare the declared columns with the actual ones and fail when extra columns are found (like _airbyte_raw_id, etc).
Environment
dbt-core: 1.9.4
dbt-databricks: 1.10.0
dbt-spark: 1.9.2
elementary: 0.18.2
Warehouse: Databricks Unity Catalog (Delta Lake)
Additional context
Might be related to Elementary not supporting information_schema.columns or schema introspection correctly with Unity Catalog on Databricks.
Would you be willing to contribute a fix for this issue? Yes, if guidance is provided.
Any news?
This issue is stale because it has been open for too long with no activity. If you would like the issue to remain open, please remove the stale label or leave a comment.