trino icon indicating copy to clipboard operation
trino copied to clipboard

Improve performance of discrete linear intersections (e.g. large DFs)

Open sopel39 opened this issue 10 months ago • 0 comments

Description

Large dynamic filters might be intersected with other large dynamic filters (e.g. when collecting DFs and in subquery cache context). Hence performance of such operations is important:

    Benchmark                                                       Mode  Cnt    Score    Error  Units
    BenchmarkSortedRangeSet.linearDiscreteIntersectDiscreteOnLarge  avgt   10    0,149 ±  0,001  ms/op
    BenchmarkSortedRangeSet.linearIntersectDiscreteOnLarge          avgt   10  106,509 ±  1,009  ms/op

A bit more context: Subquery cache utilizes TupleDomain (intersections/equality specifically) heavily to describe the cached data (enforced and unenforced predicates including DFs) and to perform cache lookups. Hence it's critical that TupleDomin performs well. DFs are naturally discrete sets and intersection performance of such large discrete sets can be improved greatly. This improves various parts of the engine:

  1. reduces DF collection latency (which reduces amount of read data and reduces coordinator CPU usage)
  2. reduces subquery cache lookup latency

FYI: @martint @dain

sopel39 avatar Apr 22 '24 13:04 sopel39