datafusion Follow-on work for `median` aggregate function

Is your feature request related to a problem or challenge? Please describe what you are trying to do. Follow-on issue to address feedback in https://github.com/apache/arrow-datafusion/pull/3009

In particular:

Run some performance test and see if it would be better to use builders in the aggregate state rather than arrays. Should we just use ScalarValue::List instead?
Should we have a configurable limit on the number of values that can be collected to prevent OOM?
Can we leverage the concat and take kernels?
Should we add an average kernel to arrow-rs?
Can we reduce memory overhead with some form of compression (such as run-length encoding or maintaining a map of unique values with counts)

Describe the solution you'd like See above

Describe alternatives you've considered None

Additional context None

Aug 05 '22 13:08 andygrove

Another optimization is that you do not need to to sort all values, only the value at the middle index needs to be at the right place. Vec::select_nth_unstable does exactly that in O(n) using a quickselect algorithm.

Aug 09 '22 14:08 jhorstmann

#7376 solves most of these

Aug 23 '23 09:08 Dandandan