dbt-core
dbt-core copied to clipboard
[Feature] Allow mapping to be used in addition to sequence in YAML to define model columns
Is this your first time submitting a feature request?
- [X] I have read the expectations for open source contributors
- [X] I have searched the existing issues, and I could not find an existing issue for this feature
- [X] I am requesting a straightforward extension of existing dbt functionality, rather than a Big Idea better suited to a discussion
Describe the feature
Currently columns within models:
have to be defined as sequence, as follows::
models:
- name: model_one
columns:
- name: first_column
- name: second_column
description: This is a column
- name: model_two
columns:
- name: first_column
- name: second_column
My proposal is to allow to columns to be defined as mapping in addition to the already supported sequence:
models:
- name: model_one
columns:
first_column: null
second_column:
description: This is a column
- name: model_two
columns:
first_column: null
second_column: null
This feature request is driven by a few factors.
-
Flexibility: The mapping format allows for more flexibility in defining columns. We can use native YAML's features for mapping, like merge https://yaml.org/type/merge.html . But, on the other hand, it doesn't force us to use it. We can still define columns as a simple sequence.
-
Readability: Thanks to implementing DRY principle, the mapping format is more readable than the sequence format and not as over-bloated. You don't have to repeat columns multiple times. In our case, we produce two types of marts, the latest "state" and the "history". The "history" mart has the same columns as the "state" mart, but with some additional columns. The mapping format would allow us to define the common columns once and then add the additional columns for the "history" mart.
columns_mart__loans: &columns_mart__loans
source_system:
description: asdf2
tests:
- not_null
source_system_id:
tests:
- unique
- not_null
models:
- name: mart__loans_history
columns:
<<: *columns_mart__loans
valid_from:
description: asdf5
tests:
- not_null
valid_to:
description: asdf6
- name: mart__loans
columns:
<<: *columns_mart__loans
Overall, I believe that allowing columns to be defined as a mapping in addition to a sequence would make the DBT's YAML files easier to read and maintain.
I am not aware of any internal design decisions within DBT that would make it impossible to implement this feature. The change itself should be relatively simple to implement, by checking the data type of the columns
key and then processing it accordingly in a generator, that yields sequence items.
Describe alternatives you've considered
YAML Limitations
As we know, YAML doesn't support flattening merged sequences, making it unsuitable for defining columns
.
(Reference: YAML Issue #35)
Additionally, YAMLScript is still in its early stages of development, so it may not be suitable for immediate use. (Reference: YAML Issue #48)
DBT's Built-in Feature
I believe DBT should avoid implementing too many YAML-specific features to prevent reinventing the wheel. Outsourcing more features allows DBT to focus on data transformation.
Custom Solution
The same reasoning applies here.
Who will this benefit?
This feature will benefit all DBT users who deal with large models, that are exposed in multiple flavours, like in our case the state
and history
models. It will also benefit users who want to define columns in a more flexible way, allowing them to use YAML's native features like merge.
Are you interested in contributing this feature?
Yes
Anything else?
No response