dbt-core
dbt-core copied to clipboard
[CT-1223] [Feature] consolidate `listagg` macros from packages and dbt core
Is this your first time submitting a feature request?
- [X] I have read the expectations for open source contributors
- [X] I have searched the existing issues, and I could not find an existing issue for this feature
- [X] I am requesting a straightforward extension of existing dbt functionality, rather than a Big Idea better suited to a discussion
Describe the feature
right now in dbt_project_evaluator, we are overriding the dbt-core
version of listagg
to allow for sorting the values in the array. We do not allow for a limit clause.
The core version on the other hand does the opposite! It allows for limiting, but not for sorting!
We really only rely on the listagg macro sorting to ensure our model outputs match seeds. Without having looked super closely, there's probably some ways for us to use the core version and still have passing tests, but wanted to at least open a thread here on what the right way to reconcile these macros would be!
Describe alternatives you've considered
the alternative would be changing our project to use the core version, which, frankly is completely reasonable
Who will this benefit?
package maintainers (specifically, @graciegoheen @b-per et al)
Are you interested in contributing this feature?
sure
Anything else?
nah
@dave-connors-3 Thanks for opening!
Is this specific to Spark/Databricks?
- Version in
dbt-spark
: https://github.com/dbt-labs/dbt-spark/blob/main/dbt/include/spark/macros/utils/listagg.sql - Version in
dbt-project-evaluator
: https://github.com/dbt-labs/dbt-project-evaluator/blob/main/macros/cross_db_shim/listagg.sql
The core version of this macro does support order_by_clause
, and that works on most adapters / data platforms, but it's not supported in SparkSQL. You're right that SparkSQL does support array_sort
— which can sort alphabetically, or with a custom case when
expression (!), but only based on the values already in the array, not based on the value of another column (the real benefit of order_by_clause
).
Relevant thread from June, wherein Doug and I were briefly spelunking down this rabbit hole: https://github.com/dbt-labs/dbt-core/pull/5298#discussion_r898176975
This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.
Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers.