Karol Sobczak

Results 33 issues of Karol Sobczak

Dear Community, As some of you know we at Teradata are planning to incorporate RE2J into Presto database (https://github.com/facebook/presto) since it is based on lighting fast RE2 and has potential...

Currently, `TrinoS3StreamingOutputStream#buffer` in not accounted for by writers. Hence it can cause OOM rather than fail queries gracefully.

bug

Decide automatically (perhaps using CBO) which way of computing distinct aggregation to use. Potential rules: * use mark distinct when number of distinct aggregations is low (e.g. 1 or 2)....

performance

`DictionaryBlockEncoding`: * we don't need to serialize `ids` as `integers`. We can use `short` or `byte` if dictionary has fewer positions `VariableWidthBlockEncoding` * we don't need to serialize `offsets` as...

performance

Javadoc suggests that there are 3 aggregations involved. However, there are only 2 aggregations and GroupId operator. ## Description ## Non-technical explanation ## Release notes ( ) This is not...

cla-signed

Currently, rollup on x, y, z will translate into plan: ``` FinalAggregregation[$id, x, y, z](aggr) RemoteExchange PartialAggregation[$id, x, y, z](aggr) GroupId ``` `GroupId` operator will multiply unaggregated input data 4...

performance

https://github.com/prestodb/presto/pull/10224 adds distinct aggregation support to aggregation operator. However, it's still not enabled by default: https://github.com/prestodb/presto/pull/10224/files#r176341005. There are few issues with (build-in) distinct aggregations. Consider query: ``` SELECT group, count(a)...

performance

Currently, `HashGenerationOptimizer.Rewriter#visitExchange` has a constraint: ``` // Currently, precomputed hash values are only supported for system hash distributions without constants ``` It would be beneficial to precompute hash values for...

performance

There is no point in creating actual page indexer if page indexer input is already single BIGINT type and bucket numbers are constrained (e.g. limited number of buckets)

performance

In `tpcds/q95` `web_sales` table is scanned multiple times. Additionally, that table is then distributed across nodes using same hash column: ``` Fragment 12 [SOURCE] CPU: 2.63m, Scheduled: 4.87m, Input: 720000376...

enhancement