iceberg Push down group by for partition columns

Push down min/max/count with group by if group by is on partition columns

For example:

CREATE TABLE test (id LONG, ts TIMESTAMP, data INT) USING iceberg PARTITIONED BY (id, ts);
SELECT MIN(data), MAX(data), COUNT(data), COUNT(*) FROM test GROUP BY id, ts;

I have the following changes:

Added Expression GroupBy, UnboundGroupBy, BoundGroupBy
In Aggregator, I will take consideration of GroupBy. Use MaxAggregator as an example:
- Add List<BoundGroupBy> in the constructor
- Add private Map<StructLike, T> maxP which contains max value for each of the partition.
- The key of the above Map is AggregateEvaluator.ArrayStructLike, which contains the partition values.
- Add update(R value, StructLike partitionKey) to update value for specific partitions
- Add Map<StructLike, R> currentPartition() to get current value for each partitions
Build the Scan schema to be: group by columns + aggregates (this is what Spark is expecting) Use SELECT MIN(data), MAX(data), COUNT(data), COUNT(*) FROM test GROUP BY id, ts as an example: the schema is LONG, TIMESTAMP, INT, INT, LONG, LONG, which corresponds to id, ts, MIN(data), MAX(data), COUNT(data), COUNT(*)

Mar 02 '23 04:03 huaxingao

cc @rdblue @aokolnychyi

Mar 02 '23 17:03 huaxingao

Oops, I didn't mean to close this! I want to work on getting it in next

Feb 02 '24 16:02 rdblue

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions.

Aug 28 '24 00:08 github-actions[bot]

This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.

Sep 04 '24 00:09 github-actions[bot]