quickwit icon indicating copy to clipboard operation
quickwit copied to clipboard

Range query optim

Open fulmicoton opened this issue 2 years ago • 3 comments

We have two ways to deal with range queries...

  • a DocSet that uses the scans the columnar and creates a buffer of the next valid docs. In its current version it is already quite sophisticated... The buffering window change size to try to adapt to different match ratio dynamically, and the buffering itself relies on SIMD. The multilinear codec probably hinders perf a lot however.
  • a filter at the collector level.

Experiment with the two solutions, and see if the filter solution outperforms the docset solution for most of the queries. If it is the case, we can then work on the QueryAST (once #3148 has landed) to bubble up range queries and extract range queries as filters.

fulmicoton avatar Apr 19 '23 02:04 fulmicoton

#3329 will add support for warming up a range of inverted index. This means there is a 3rd option, using classic tantivy RangeQuery over the inverted index. This is likely slower on large ranges, but likely faster on smaller ranges.

trinity-1686a avatar May 22 '23 10:05 trinity-1686a

Related issue: https://github.com/quickwit-oss/tantivy/issues/2266

PSeitz avatar Nov 22 '23 11:11 PSeitz

Related issue: https://github.com/quickwit-oss/tantivy/issues/2531

PSeitz avatar Oct 28 '24 07:10 PSeitz

One other thing that I'll point out would be to add block-based metadata to fast fields.

One of the ways that file formats like Parquet and Vortex accelerate range scans (as well as point queries, I suppose) is by mixing sorting/z-ordering with per-block metadata.

A similar approach in tantivy's fast fields would be to:

  1. restore index sorting
  2. adjust some/all of the fast-fields codecs to store min/max on a block basis, and to then use it to eliminate entire blocks from range queries.

stuhood avatar Jul 16 '25 19:07 stuhood