trino
trino copied to clipboard
Suboptimal AggregationState in DoubleSumAggregation and LongSumAggregation
When the aggregated column is of type bigint/double or the like, the partial aggregation ends up creating a wrapped row type for it. This is unnecessary, inefficient and can cause a big increase for the intermediate data. I believe this is due to #464, where the optimization for window functions sacrificed aggregation performance in general. @kokosing @sopel39
explain (type logical)
select ss_item_sk, sum(ss_customer_sk)
from store_sales
group by ss_item_sk;
Output[columnNames = [ss_item_sk, _col1]]
│ Layout: [ss_item_sk:bigint, sum:bigint]
│ Estimates: {rows: 2000 (35.16kB), cpu: 0, memory: 0B, network: 0B}
│ _col1 := sum
└─ Aggregate[type = FINAL, keys = [ss_item_sk]]
│ Layout: [ss_item_sk:bigint, sum:bigint]
│ Estimates: {rows: 2000 (35.16kB), cpu: 8.39M, memory: 35.16kB, network: 0B}
│ sum := sum("sum_0")
└─ LocalExchange[partitioning = HASH, hashColumn = [$hashvalue], arguments = ["ss_item_sk"]]
│ Layout: [ss_item_sk:bigint, sum_0:row(bigint, bigint), $hashvalue:bigint]
│ Estimates: {rows: 120527 (8.39MB), cpu: 8.39M, memory: 0B, network: 0B}
└─ RemoteExchange[partitionCount = 4, type = REPARTITION, hashColumn = [$hashvalue_1]]
│ Layout: [ss_item_sk:bigint, sum_0:row(bigint, bigint), $hashvalue_1:bigint]
│ Estimates: {rows: 120527 (8.39MB), cpu: 8.39M, memory: 0B, network: 8.39MB}
└─ Project[]
│ Layout: [ss_item_sk:bigint, sum_0:row(bigint, bigint), $hashvalue_2:bigint]
│ Estimates: {rows: 120527 (8.39MB), cpu: 8.39M, memory: 0B, network: 0B}
│ $hashvalue_2 := combine_hash(bigint '0', COALESCE("$operator$hash_code"("ss_item_sk"), 0))
└─ Aggregate[type = PARTIAL, keys = [ss_item_sk]]
│ Layout: [ss_item_sk:bigint, sum_0:row(bigint, bigint)]
│ Estimates: {rows: 120527 (7.36MB), cpu: ?, memory: ?, network: ?}
│ sum_0 := sum("ss_customer_sk")
└─ TableScan[table = tpcds:store_sales:sf0.01]
Layout: [ss_item_sk:bigint, ss_customer_sk:bigint]
Estimates: {rows: 120527 (2.03MB), cpu: 2.03M, memory: 0B, network: 0B}
ss_customer_sk := tpcds:ss_customer_sk
ss_item_sk := tpcds:ss_item_sk