dbt-core icon indicating copy to clipboard operation
dbt-core copied to clipboard

[CT-1223] [Feature] consolidate `listagg` macros from packages and dbt core

Open dave-connors-3 opened this issue 2 years ago • 1 comments

Is this your first time submitting a feature request?

  • [X] I have read the expectations for open source contributors
  • [X] I have searched the existing issues, and I could not find an existing issue for this feature
  • [X] I am requesting a straightforward extension of existing dbt functionality, rather than a Big Idea better suited to a discussion

Describe the feature

right now in dbt_project_evaluator, we are overriding the dbt-core version of listagg to allow for sorting the values in the array. We do not allow for a limit clause.

The core version on the other hand does the opposite! It allows for limiting, but not for sorting!

We really only rely on the listagg macro sorting to ensure our model outputs match seeds. Without having looked super closely, there's probably some ways for us to use the core version and still have passing tests, but wanted to at least open a thread here on what the right way to reconcile these macros would be!

Describe alternatives you've considered

the alternative would be changing our project to use the core version, which, frankly is completely reasonable

Who will this benefit?

package maintainers (specifically, @graciegoheen @b-per et al)

Are you interested in contributing this feature?

sure

Anything else?

nah

dave-connors-3 avatar Sep 21 '22 01:09 dave-connors-3

@dave-connors-3 Thanks for opening!

Is this specific to Spark/Databricks?

  • Version in dbt-spark: https://github.com/dbt-labs/dbt-spark/blob/main/dbt/include/spark/macros/utils/listagg.sql
  • Version in dbt-project-evaluator: https://github.com/dbt-labs/dbt-project-evaluator/blob/main/macros/cross_db_shim/listagg.sql

The core version of this macro does support order_by_clause, and that works on most adapters / data platforms, but it's not supported in SparkSQL. You're right that SparkSQL does support array_sort — which can sort alphabetically, or with a custom case when expression (!), but only based on the values already in the array, not based on the value of another column (the real benefit of order_by_clause).

Relevant thread from June, wherein Doug and I were briefly spelunking down this rabbit hole: https://github.com/dbt-labs/dbt-core/pull/5298#discussion_r898176975

jtcohen6 avatar Sep 21 '22 10:09 jtcohen6

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

github-actions[bot] avatar Dec 21 '22 01:12 github-actions[bot]

Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers.

github-actions[bot] avatar Dec 28 '22 01:12 github-actions[bot]