arrow icon indicating copy to clipboard operation
arrow copied to clipboard

[Python][Acero] Provide method to perform aggregations with acero for datasets

Open sidneymau opened this issue 1 year ago • 1 comments

Describe the enhancement requested

Presently, Dataset has methods to perform several actions—sort_by, join, and join_asof—with Acero. It would be especially helpful to provide a method to perform aggregations on datasets using Acero for convenient out of core processing.

The implementation can be modeled off of the existing Dataset Acero operations as well as the aggregate method of TableGroupBy.

Component(s)

Python

sidneymau avatar Sep 19 '24 04:09 sidneymau

Note that the implementation proposed in the above PR ends up being fairly inefficient because it can't fully leverage nodes for, e.g., projections and filtering. This functionality could be included—basically providing a dataframe-like interface to constructing an Acero plan as can be done with DataFusion—but that is substantially larger in scope

sidneymau avatar Sep 19 '24 23:09 sidneymau