blazingsql
blazingsql copied to clipboard
collection_set feature request
Is your feature request related to a problem? Please describe. Hello guys, I have a SQL sample running on the Blzaingsql. I see the collection is not supported in 0.20 but the cudf has done that.
Describe the solution you'd like
Here is my sample code.
q = '''
SELECT
id, collect_set(date)
FROM
table
GROUP BY
id
'''
Describe alternatives you've considered Here is a cudf solution.
q = table.groupby(['id'], as_index = False).agg({'date' : collect})
q.date = q.date.list.unique()
Hello @MikeChenfu there are several functions that are available in cudf that we would love to implement in BlazingSQL such as collection_set
and lateral view explode
. The problem is that those functions are not part of standard SQL, which means they are not understood by Apache Calcite as we are using it right now. We use Apache Calcite to parse the SQL queries and provide us with an optimized logical relational algebra plan. We are currently looking into how we can leverage or modify Apache Calcite to allow us to implement functions that are found in Hive or Spark but not in standard SQL. I dont expect this to be a quick project, but it is something we want to do and are currently looking into alternatives that would allow us to support these sort of functions.
Thanks @williamBlazing for the detailed explanation. Glad to hear you have a plan to do that.
Perhaps it might make sense to leave this issue open as a feature request?