trino
trino copied to clipboard
Improve performance of discrete linear intersections (e.g. large DFs)
Description
Large dynamic filters might be intersected with other large dynamic filters (e.g. when collecting DFs and in subquery cache context). Hence performance of such operations is important:
Benchmark Mode Cnt Score Error Units
BenchmarkSortedRangeSet.linearDiscreteIntersectDiscreteOnLarge avgt 10 0,149 ± 0,001 ms/op
BenchmarkSortedRangeSet.linearIntersectDiscreteOnLarge avgt 10 106,509 ± 1,009 ms/op
A bit more context: Subquery cache utilizes TupleDomain
(intersections/equality specifically) heavily to describe the cached data (enforced and unenforced predicates including DFs) and to perform cache lookups. Hence it's critical that TupleDomin
performs well. DFs are naturally discrete sets and intersection performance of such large discrete sets can be improved greatly. This improves various parts of the engine:
- reduces DF collection latency (which reduces amount of read data and reduces coordinator CPU usage)
- reduces subquery cache lookup latency
FYI: @martint @dain