dbt-databricks
dbt-databricks copied to clipboard
dbt seed command throws 'dict object' has no attribute '...' when seed files have some columns described
Describe the bug
I'm upgrading from dbt-databricks version 1.6.5 to version 1.8.3 and stumbled upon an issue when trying to use the dbt seed.
As I didn't find any open issues I'm creating a new one.
dbt seed will result in an error when the seed file has some some but not all columns ˇdefined in the config file.
Steps To Reproduce
- Create a simple seed file in your seed-paths folder (eg test_seed.csv).
id,some_string
1,abc
2,cde
3,fgh
- Create a config file in the same folder (eg data.yml) and add the following config in the seed file:
seeds:
- name: test_seed
columns:
- name: id
description: the unique identifier
# - name: some_string
# description: some super important text
- Run
dbt seed -s test_seed - This will result in an error for this seed file (terminal output below)
- The seed command will not fail if you uncomment the last two rows in the config file or when descriptions are completely removed
Expected behavior
The dbt seed command should not return an error when only some columns for the seed file are described.
Screenshots and log output
$ dbt seed -s test_seed
14:12:53 Running with dbt=1.8.3
14:12:53 Registered adapter: databricks=1.8.3
14:12:55 Found xxx models, xx seeds, xxx data tests, xxx sources, xxx macros
14:12:55
14:13:07 Concurrency: 16 threads (target='dev')
14:13:07
14:13:07 1 of 1 START seed file dwh_seed_data.test_seed ................................. [RUN]
14:13:07 1 of 1 ERROR loading seed file dwh_seed_data.test_seed ......................... [ERROR in 0.03s]
14:13:09
14:13:09 Finished running 1 seed in 0 hours 0 minutes and 13.84 seconds (13.84s).
14:13:09
14:13:09 Completed with 1 error and 0 warnings:
14:13:09
14:13:09 Compilation Error in seed test_seed (data/test_seed.csv)
'dict object' has no attribute 'some_string'
> in macro databricks__create_csv_table (macros/materializations/seeds/helpers.sql)
> called by macro create_csv_table (macros/materializations/seeds/helpers.sql)
> called by macro materialization_seed_databricks (macros/materializations/seeds/seeds.sql)
> called by seed test_seed (data/test_seed.csv)
System information
The output of dbt --version:
dbt --version
Core:
- installed: 1.8.3
- latest: 1.8.3 - Up to date!
Plugins:
- spark: 1.8.0 - Up to date!
- databricks: 1.8.3 - Up to date!
The operating system you're using:
The output of python --version:
Python 3.11.9
Additional context
I've traced the issue down to the dbt/include/databricks/macros/materializations/seeds/helpers.sql file (the error log is helpful!) and to three rows specifically:
{%- for col_name in agate_table.column_names -%}
...
{%- if column_comment -%}
{%- set comment = model.columns[col_name]['description'] | replace("'", "\\'") -%}
The lines of code expect every column in the table to also be defined in the properties.yml file and will otherwise result in a key error as the column name will not bot present in the models.columns dictionary.
The easiest fix should be to check if the column is also defined in the properties file:
{%- if column_comment and col_name in model.columns.keys() -%}
or if we want to be extra safe then could also add:
{%- set comment = model.columns.get(col_name, {}).get('description', '') | replace("'", "\\'") -%}