Karol Sobczak comments

Results 121 comments of


                                            Karol Sobczak

Future work

Yes, currently it is a fork, but it is possible to support String API again. In order to do so a program generation for UTF16 bytes is required. Additionally, String...

Here are the newest Caliper results from using DFA from https://github.com/Teradata/re2j/tree/re2j_on_bytes branch: https://microbenchmarks.appspot.com/runs/b7e3d122-784d-4279-bfc7-ddd8a81dd94e#r:scenario.benchmarkSpec.methodName,scenario.benchmarkSpec.parameters.implementation

Future work

@alandonovan that pull request was created by mistake (it goes to TD fork), I have closed it already. I understand that you are reluctant to use Slice, but interface is...

Future work

Slice is convenient for us since it is used in Presto, therefore data doesn't have to be converted. Of course it is possible to replace Slice with other structure, but...

Future work

> you could make the API more general and useful by expressing it in terms of (byte[], start, length) triples I though about using triples, however you would have to...

Row wise group by on fixed width types

Your benchmark PDF is missing results for partitioned/unpartitioned data (for such a big change) and peak memory metrics

Row wise group by on fixed width types

@lukasz-stec how could that peak memory didn't go up (or even dropped). Are we correctly accounting mem in this PR?

Row wise group by on fixed width types

> IMO This shows that moving row-wise is a good direction to improve hash aggregation performance. I think we can start with having `row-wise-signature` for fast hash lookups (in `MultiChannelGroupByHash`)....

Row wise group by on fixed width types

@lukasz-stec Rather than generating source code, we can use `MethodHandle` composition, see https://github.com/trinodb/trino/pull/14178

Use distinct aggregations by default (instead of MarkDistinctNode) for single distinct aggregation

cc @lukasz-stec