spark-utils
spark-utils copied to clipboard
More possible macros to add
for cross db compatibility and making packages easier to use:
-
ceiling
-ceiling
in spark butceil
in snowflake -
list_agg
/string_agg
isconcat_ws({{ delimiter }}, collect_list({{ field_to_agg }})
last_value(priority ignore nulls) over (partition by ticket_id order by date_day asc rows between unbounded preceding and current row) as priority
becomes
last_value(priority, true) over (partition by ticket_id order by date_day asc rows between unbounded preceding and current row) as priority
ignore nulls is what we're handling here. good for all/most window functions to do this too. 😄
The Zendesk package (Fivetran) has
{% macro json_extract(string, string_path) -%}
{{ adapter.dispatch('json_extract', packages = _get_utils_namespaces()) (string, string_path) }}
{%- endmacro %}
{% macro default__json_extract(string, string_path) %}
json_extract_path_text({{string}}, {{ "'" ~ string_path ~ "'" }} )
{% endmacro %}
{% macro bigquery__json_extract(string, string_path) %}
json_extract({{string}}, {{ "'$." ~ string_path ~ "'" }} )
{% endmacro %}
I think the Spark SQL version of this would be https://spark.apache.org/docs/2.3.0/api/sql/#get_json_object
I had a chance to catch up with @fivetran-kristin last week. Fivetran has done a cool thing and moved most of their SQL syntax macros into a dbt_fivetran_utils
package.
Kristin is thinking about whether to add spark__
implementations of those macros into that package directly, or into this package, nested under macros/dbt_fivetran_utils/
.
@fivetran-kristin do you have more thoughts you want to share here? Would love to coordinate efforts.
@emilieschario and @jtcohen6 - I think the best solution here is to nest all spark implementations of our macros into this package.
I've outlined more details on Fivetran's approach to Spark compatibility here. @jtcohen6 I've shared with you. @emilieschario I don't have your email so wasn't able to share. Can you either request access or provide your email address?
Thanks!
Just requested access so that I don't have to post my email on the inter webs 😄
@fivetran-kristin I like it! Thanks for developing a solid plan here.
I added a comment in support of fivetran_utils_dispatch_list
. It might be overkill for each package to allow itself to be shimmed in slightly different ways, but I prefer being more flexible/explicit now in the event it proves necessary. If, in 6-12 months time, it turns out to be entirely superfluous, we can just standardize all packages to take a single variable adapter_dispatch_list
(with backward support for the old variable names), no harm done.
- [ ] ceiling - ceiling in spark but ceil in snowflake
- [ ] list_agg/string_agg is concat_ws({{ delimiter }}, collect_list({{ field_to_agg }})
- [ ] last_value
- [ ] json_extract