dbt-codegen icon indicating copy to clipboard operation
dbt-codegen copied to clipboard

Generate Source Yaml doesn't preserve capitalization on schema/table names, creates unusable yaml for BQ (probably others)

Open ryantimjohn opened this issue 1 year ago • 6 comments

Describe the bug

Generate Source Yaml doesn't preserve capitalization on schema/table names, creates unusable yaml for BQ (probably others)

Steps to reproduce

Generate source yaml against a table with capitalized schema/table names, with generate a source yaml with lower-case schema and table names.

Expected results

Preserve capitalization, preferable as an true/false in the function call.

Actual results

Screenshot 2023-09-19 at 11 48 41 AM

Screenshot 2023-09-19 at 11 48 54 AM

System information

The contents of your packages.yml file:

packages:
  - package: dbt-labs/codegen
    version: 0.10.0

Which database are you using dbt with?

  • [x] bigquery

The output of dbt --version: 1.3

The operating system you're using: Linux (dbt Cloud)

The output of python --version: dbt Cloud

Are you interested in contributing the fix?

Yes! Point me to the lines of code that do this and I'll happily contribute to a fix!

ryantimjohn avatar Sep 19 '23 15:09 ryantimjohn

+1 to this for Snowflake with dbt 1.5.0.

If I have a column name in my source called firstName (note the capitalisation) generate_source produces YAML with the column firstname, which in Snowflake will be interpreted as FIRSTNAME since I did not give (nor is there an option to quote the column names for the source YAML).

As a result if I add a not_null test on the firstname column, e.g config

- name: firstname
  data_type: varchar
  description: ""
  tests: 
    - not_null

the test will fail with: 000904 (42000): SQL compilation error: ... invalid identifier 'FIRSTNAME'.

That's obviously because the Snowflake column is firstName and neither selecting FIRSTNAME nor firstname will work.

Perhaps by default the generated YAML should be with lower case column names, but have an option to quote things / preserve capitalisation , e.g. for my case would produce

- name: firstName
  data_type: varchar
  description: ""
  quote: true

Then I can manually add my not null test to the above config and everything will work.

I'm working with a Snowflake source table with around 150 columns in which the capitalisation is all over the place (all lower, all upper, camel case, title case and combinations of the aforementioned) and I can't control the case of the source column names.

In this case codegen doesn't really help a lot since I would have to go through 150 columns and make sure the case is ok and manually add quote: true

az-hienas avatar Oct 02 '23 08:10 az-hienas

Makes sense that this should be a top-level option you can apply from codegen -- I'm going to be attempting some work to catch this package up based on the open issues -- thanks for sharing this! I'll keep you posted.

gwenwindflower avatar Feb 22 '24 18:02 gwenwindflower

Adding my vote that this would be a very useful parameter for have - both for generate_source_yaml and generate_model_yaml. Thank you @gwenwindflower for digging into this!

jenna-jordan avatar Feb 22 '24 22:02 jenna-jordan

there ended up being an open PR to do this, so i'm working with author of that to get these changes in, ideally covering the entire database -> schema -> table -> column chain in sources.

gwenwindflower avatar Feb 29 '24 16:02 gwenwindflower

@gwenwindflower Joining the party here - any plan/progress with supporting quote param per column? All our columns in SF are quotes hence we need to add quote: true to all generated source columns Thanks :)

Currently I am using sed to add that param when generating: sed -E "s/([[:space:]]+)data_type:/\1quote: true\n\1data_type:/g"

ronlut avatar Jun 02 '24 11:06 ronlut

This also affects the generate_model_yaml when the model file names have mixed-case. It generates model yaml output with all lower-cased table and field names.

urkle avatar Jul 12 '24 16:07 urkle