dbt-invoke
dbt-invoke copied to clipboard
Generate model properties with types
On top of generating the list of all columns, it would be great if dbt-invoke could also pull the data_type and write it to the properties, so that one might use model contracts
I plan to take this issue.
Currently the column information is from dbt-invoke/dbt_invoke/internal/_utils.py
with below MACRO _log_columns_list
MACROS = {
'_log_columns_list': (
"\n{# This macro is intended for use by dbt-invoke #}"
"\n{% macro _log_columns_list(sql=none, resource_name=none) %}"
"\n {% if sql is none %}"
"\n {% set sql = 'select * from ' ~ ref(resource_name) %}"
"\n {% endif %}"
"\n {% if execute %}"
"\n {{ log(get_columns_in_query(sql), info=True) }}"
"\n {% endif %}"
"\n{% endmacro %}\n"
)
}
Based on information https://discourse.getdbt.com/t/is-there-a-way-to-get-the-column-data-types-from-sql-in-v1-4-4/7834, there is a new macro get_column_schema_from_query
which is able to get column data_type
Then this information is passed to dbt-invoke/dbt_invoke/properties.py
function _get_property_column
as input to populate the column properties. Need to add data_type when populating the column_dict
.
def _get_property_column(column_name):
"""
Create a dictionary representing column properties
:param column_name: Name of column
:return: A dictionary representing column properties
"""
column_dict = {'name': column_name, 'description': ""}
return column_dict
Current proposal is:
- Add a flag (
--add_data_type
) to indicate where data_type should be included in properties file. - If
add_data_type
is true (default is false for compatibility), use macroget_column_schema_from_query
to get column info with data_type - populate this info
_get_property_column
withdata_type
@NUSTemple thanks for your evaluation of the issue! I appreciate your willingness to take it on.
Here are some of my initial thoughts:
-
- Add a flag (--add_data_type) to indicate where data_type should be included in properties file.
- What exactly do you mean by where? I'm asking because my understanding is that when a contract is enforced,
data_type
must be defined for every column in the model.
-
Do you plan to also include the
enforced: true
setting under the model'scontract
configuration? (https://docs.getdbt.com/docs/collaborate/govern/model-contracts#how-to-define-a-contract) -
Another consideration is how to make sure
dbt-invoke
avoids applying contract logic to ephemeral models and other resource types that do not support contracts.
This would be useful for us too.
I'd expect a flag called --add-data-type to simply add the data_type in without setting up model contracts. If the goal is to set up model contracts then something like --model-contracts = True|False
I would use it to
- Select the consumption aka gold tables that I want to enforce
- Enable to enforce flag manually if required
So since i select the consumption tables anyway, ephemeral limitations doesn't bother me