dbt-databricks
dbt-databricks copied to clipboard
Sources ignore default catalog 1.7.4 previously was working
Describe the bug
DBT Cloud In dbt-databricks adapter version 1.5 we were not providing a Catalog in the connection settings. The catalog was defaulted on the SQL warehouse. Update testing for 1.7.4 and we are finding that for sources the warehouse default is ignored.
Models are building in the correct default catalog, but sources always look in hive_metastore.
Setting the catalog in the connection corrects this behavior, but dbt should respect the default catalog set in Databricks if one is set
Steps To Reproduce
Define a source with schema but no database/catalog Downstream model using that source Leave catalog null in connection Set warehouse default catalog to something other than hive_metastore where the data is
Source will attempt to access hive_metastore
Expected behavior
A clear and concise description of what you expected to happen.
Screenshots and log output
Dev environment: warehouse default catalog = dev_products, null catalog in DBT Databricks connection
Logs:
select current_catalog()
use catalog dev_products
show table extended in dev_products.default_core like '*'
use catalog hive_metastore
describe extended hive_metastore.raw.mrcgroupdim
after updating connection in dbt to set catalog to dev_products:
select current_catalog()
show table extended in dev_products.default_core like '*'
describe extended dev_products.raw.mrcgroupdim
System information
14:27:07 Running with dbt=1.7.6 14:27:09 Registered adapter: databricks=1.7.4
DBT Cloud
Thanks for the report, will investigate.
Can you validate, do you have this same problem with 1.7.3? I think I know the cause, but getting that additional data point would help significantly. I'm pretty sure this is related to changes we had to make in 1.7.0 due to catalog and schema becoming non-optional on Credentials, so if that is the issue (in which it would also repro on 1.7.3), the fix might be a little involved.
@benc-db Unfortunately it looks like DBT Cloud only let's me select the major version and won't let me specify a minor version. Not seeing anything in the docs that I can use to override that. We have a workaround in place now by specifying the catalog explicitly, but may want to make sure that this scenario is noted in the upgrade notes. Also, I see 1.7.4 of the dbt-databricks releases is marked pre-release now, but looks like DBT Cloud is still pulling it in. Not sure if there is something required between Databricks and DBT to coordinate rollback items like this. From triggering a run just now to confirm: 16:49:13 Registered adapter: databricks=1.7.4
Yeah, unfortunately there is required coordination. 1.7.4 has been pulled because it does not play nicely with cold-starting non-serverless SQL Warehouses. Working on fixing that at high priority so that next week dbt Cloud can pick up a good version.