Karol Sobczak
Karol Sobczak
Future work
Dear Community, As some of you know we at Teradata are planning to incorporate RE2J into Presto database (https://github.com/facebook/presto) since it is based on lighting fast RE2 and has potential...
Currently, `TrinoS3StreamingOutputStream#buffer` in not accounted for by writers. Hence it can cause OOM rather than fail queries gracefully.
Decide automatically (perhaps using CBO) which way of computing distinct aggregation to use. Potential rules: * use mark distinct when number of distinct aggregations is low (e.g. 1 or 2)....
`DictionaryBlockEncoding`: * we don't need to serialize `ids` as `integers`. We can use `short` or `byte` if dictionary has fewer positions `VariableWidthBlockEncoding` * we don't need to serialize `offsets` as...
Javadoc suggests that there are 3 aggregations involved. However, there are only 2 aggregations and GroupId operator. ## Description ## Non-technical explanation ## Release notes ( ) This is not...
Currently, rollup on x, y, z will translate into plan: ``` FinalAggregregation[$id, x, y, z](aggr) RemoteExchange PartialAggregation[$id, x, y, z](aggr) GroupId ``` `GroupId` operator will multiply unaggregated input data 4...
Use distinct aggregations by default (instead of MarkDistinctNode) for single distinct aggregation
https://github.com/prestodb/presto/pull/10224 adds distinct aggregation support to aggregation operator. However, it's still not enabled by default: https://github.com/prestodb/presto/pull/10224/files#r176341005. There are few issues with (build-in) distinct aggregations. Consider query: ``` SELECT group, count(a)...
Currently, `HashGenerationOptimizer.Rewriter#visitExchange` has a constraint: ``` // Currently, precomputed hash values are only supported for system hash distributions without constants ``` It would be beneficial to precompute hash values for...
There is no point in creating actual page indexer if page indexer input is already single BIGINT type and bucket numbers are constrained (e.g. limited number of buckets)
In `tpcds/q95` `web_sales` table is scanned multiple times. Additionally, that table is then distributed across nodes using same hash column: ``` Fragment 12 [SOURCE] CPU: 2.63m, Scheduled: 4.87m, Input: 720000376...