automate-dv
automate-dv copied to clipboard
[FEATURE] Arbitrary aliasing in macros
At the moment you can alias a hashdiff in dbtvault satellite macros. This is so that you may have a standardised name for your hashdiff in every satellite, even if they share the same primed staging area.
This use case is not isolated to hashdiffs however, for example for a Link Primary key we may want to always call them LINK_HK
fo simplicity of joins later on, or just so we have uniform links.
Being able to do this for any parameter in any macro would be ideal, and would make it much easier to standardise across our raw vault.
Something like the following would be very useful:
{%- set yaml_metadata -%}
source_model: stg_web_customer_hashed
src_pk: CUSTOMER_HK
src_nk:
- source_column: HUB_CUSTOMER_CK
alias: COLLISION_KEY
- CUSTOMER_ID
src_ldts: LOAD_DATETIME
src_source: RECORD_SOURCE
{%- endset -%}
{% set metadata_dict = fromyaml(yaml_metadata) %}
{{ dbtvault.hub(src_pk=metadata_dict["src_pk"],
src_nk=metadata_dict["src_nk"],
src_ldts=metadata_dict["src_ldts"],
src_source=metadata_dict["src_source"],
source_model=metadata_dict["source_model"]) }}
Resulting in a table like:
CUSTOMER_HK | CUSTOMER_ID | COLLISION_KEY | LOAD_DATETIME | RECORD_SOURCE |
---|---|---|---|---|
b8c37e33defde51cf91e1e03e51657da | 1001 | CUSTOMER_CRM | 1993-01-01 00:00:00.000 | CUSTOMER_CRM |
The future
On the topic of standardised naming, we plan to make this configuration centralised. Some day wuldn't it be cool if you could do the following in dbt_project.yml
:
vars:
hash: MD5
hashdiff_name: HASHDIFF
This would ensure the hashdiff of every satellite would be named HASHDIFF