dbt-invoke icon indicating copy to clipboard operation
dbt-invoke copied to clipboard

Generate model properties with types

Open Esya opened this issue 1 year ago • 3 comments

On top of generating the list of all columns, it would be great if dbt-invoke could also pull the data_type and write it to the properties, so that one might use model contracts

Esya avatar Nov 09 '23 13:11 Esya

I plan to take this issue.

Currently the column information is from dbt-invoke/dbt_invoke/internal/_utils.py with below MACRO _log_columns_list

MACROS = {
    '_log_columns_list': (
        "\n{# This macro is intended for use by dbt-invoke #}"
        "\n{% macro _log_columns_list(sql=none, resource_name=none) %}"
        "\n    {% if sql is none %}"
        "\n        {% set sql = 'select * from ' ~ ref(resource_name) %}"
        "\n    {% endif %}"
        "\n    {% if execute %}"
        "\n        {{ log(get_columns_in_query(sql), info=True) }}"
        "\n    {% endif %}"
        "\n{% endmacro %}\n"
    )
} 

Based on information https://discourse.getdbt.com/t/is-there-a-way-to-get-the-column-data-types-from-sql-in-v1-4-4/7834, there is a new macro get_column_schema_from_query which is able to get column data_type

Then this information is passed to dbt-invoke/dbt_invoke/properties.py function _get_property_column as input to populate the column properties. Need to add data_type when populating the column_dict.

  def _get_property_column(column_name):
      """
      Create a dictionary representing column properties
  
      :param column_name: Name of column
      :return: A dictionary representing column properties
      """
      column_dict = {'name': column_name, 'description': ""}
      return column_dict

Current proposal is:

  1. Add a flag (--add_data_type) to indicate where data_type should be included in properties file.
  2. If add_data_type is true (default is false for compatibility), use macro get_column_schema_from_query to get column info with data_type
  3. populate this info _get_property_column with data_type

NUSTemple avatar Apr 13 '24 02:04 NUSTemple

@NUSTemple thanks for your evaluation of the issue! I appreciate your willingness to take it on.

Here are some of my initial thoughts:

    1. Add a flag (--add_data_type) to indicate where data_type should be included in properties file.
    • What exactly do you mean by where? I'm asking because my understanding is that when a contract is enforced, data_type must be defined for every column in the model.
  • Do you plan to also include the enforced: true setting under the model's contract configuration? (https://docs.getdbt.com/docs/collaborate/govern/model-contracts#how-to-define-a-contract)

  • Another consideration is how to make sure dbt-invoke avoids applying contract logic to ephemeral models and other resource types that do not support contracts.

robastel avatar Apr 16 '24 18:04 robastel

This would be useful for us too.

I'd expect a flag called --add-data-type to simply add the data_type in without setting up model contracts. If the goal is to set up model contracts then something like --model-contracts = True|False

I would use it to

  • Select the consumption aka gold tables that I want to enforce
  • Enable to enforce flag manually if required

So since i select the consumption tables anyway, ephemeral limitations doesn't bother me

mahazza avatar Apr 26 '24 01:04 mahazza