Karol Sobczak
Karol Sobczak
Yes, currently it is a fork, but it is possible to support String API again. In order to do so a program generation for UTF16 bytes is required. Additionally, String...
Here are the newest Caliper results from using DFA from https://github.com/Teradata/re2j/tree/re2j_on_bytes branch: https://microbenchmarks.appspot.com/runs/b7e3d122-784d-4279-bfc7-ddd8a81dd94e#r:scenario.benchmarkSpec.methodName,scenario.benchmarkSpec.parameters.implementation
@alandonovan that pull request was created by mistake (it goes to TD fork), I have closed it already. I understand that you are reluctant to use Slice, but interface is...
Slice is convenient for us since it is used in Presto, therefore data doesn't have to be converted. Of course it is possible to replace Slice with other structure, but...
> you could make the API more general and useful by expressing it in terms of (byte[], start, length) triples I though about using triples, however you would have to...
Your benchmark PDF is missing results for partitioned/unpartitioned data (for such a big change) and peak memory metrics
@lukasz-stec how could that peak memory didn't go up (or even dropped). Are we correctly accounting mem in this PR?
> IMO This shows that moving row-wise is a good direction to improve hash aggregation performance. I think we can start with having `row-wise-signature` for fast hash lookups (in `MultiChannelGroupByHash`)....
@lukasz-stec Rather than generating source code, we can use `MethodHandle` composition, see https://github.com/trinodb/trino/pull/14178