dbt-databricks dbt seed command throws 'dict object' has no attribute '...' when seed files have some columns described

dbt seed command throws 'dict object' has no attribute '...' when seed files have some columns described

Open kass-artur opened this issue 1 year ago • 0 comments

trafficstars

Describe the bug

I'm upgrading from dbt-databricks version 1.6.5 to version 1.8.3 and stumbled upon an issue when trying to use the dbt seed. As I didn't find any open issues I'm creating a new one. dbt seed will result in an error when the seed file has some some but not all columns ˇdefined in the config file.

Steps To Reproduce

Create a simple seed file in your seed-paths folder (eg test_seed.csv).

id,some_string
1,abc
2,cde
3,fgh

Create a config file in the same folder (eg data.yml) and add the following config in the seed file:

seeds:
  - name: test_seed
    columns:
      - name: id
        description: the unique identifier
      # - name: some_string
      #   description: some super important text

Run dbt seed -s test_seed
This will result in an error for this seed file (terminal output below)
The seed command will not fail if you uncomment the last two rows in the config file or when descriptions are completely removed

Expected behavior

The dbt seed command should not return an error when only some columns for the seed file are described.

Screenshots and log output

$ dbt seed -s test_seed
14:12:53  Running with dbt=1.8.3
14:12:53  Registered adapter: databricks=1.8.3
14:12:55  Found xxx models, xx seeds, xxx data tests, xxx sources, xxx macros
14:12:55  
14:13:07  Concurrency: 16 threads (target='dev')
14:13:07  
14:13:07  1 of 1 START seed file dwh_seed_data.test_seed ................................. [RUN]
14:13:07  1 of 1 ERROR loading seed file dwh_seed_data.test_seed ......................... [ERROR in 0.03s]
14:13:09  
14:13:09  Finished running 1 seed in 0 hours 0 minutes and 13.84 seconds (13.84s).
14:13:09  
14:13:09  Completed with 1 error and 0 warnings:
14:13:09  
14:13:09    Compilation Error in seed test_seed (data/test_seed.csv)
  'dict object' has no attribute 'some_string'
  
  > in macro databricks__create_csv_table (macros/materializations/seeds/helpers.sql)
  > called by macro create_csv_table (macros/materializations/seeds/helpers.sql)
  > called by macro materialization_seed_databricks (macros/materializations/seeds/seeds.sql)
  > called by seed test_seed (data/test_seed.csv)

System information

The output of dbt --version:

dbt --version
Core:
  - installed: 1.8.3
  - latest:    1.8.3 - Up to date!

Plugins:
  - spark:      1.8.0 - Up to date!
  - databricks: 1.8.3 - Up to date!

The operating system you're using:

The output of python --version:

Python 3.11.9

Additional context

I've traced the issue down to the dbt/include/databricks/macros/materializations/seeds/helpers.sql file (the error log is helpful!) and to three rows specifically:

        {%- for col_name in agate_table.column_names -%}
        ...
            {%- if column_comment -%}       
              {%- set comment = model.columns[col_name]['description'] | replace("'", "\\'") -%}

The lines of code expect every column in the table to also be defined in the properties.yml file and will otherwise result in a key error as the column name will not bot present in the models.columns dictionary.

The easiest fix should be to check if the column is also defined in the properties file: {%- if column_comment and col_name in model.columns.keys() -%} or if we want to be extra safe then could also add: {%- set comment = model.columns.get(col_name, {}).get('description', '') | replace("'", "\\'") -%}

Jul 04 '24 15:07 kass-artur

dbt-databricks dbt-databricks copied to clipboard

dbt seed command throws 'dict object' has no attribute '...' when seed files have some columns described

Describe the bug

Steps To Reproduce

Expected behavior

Screenshots and log output

System information

Additional context

dbt-databricks
dbt-databricks copied to clipboard