quickwit
quickwit copied to clipboard
Range query optim
We have two ways to deal with range queries...
- a DocSet that uses the scans the columnar and creates a buffer of the next valid docs. In its current version it is already quite sophisticated... The buffering window change size to try to adapt to different match ratio dynamically, and the buffering itself relies on SIMD. The multilinear codec probably hinders perf a lot however.
- a filter at the collector level.
Experiment with the two solutions, and see if the filter solution outperforms the docset solution for most of the queries. If it is the case, we can then work on the QueryAST (once #3148 has landed) to bubble up range queries and extract range queries as filters.
#3329 will add support for warming up a range of inverted index. This means there is a 3rd option, using classic tantivy RangeQuery over the inverted index. This is likely slower on large ranges, but likely faster on smaller ranges.
Related issue: https://github.com/quickwit-oss/tantivy/issues/2266
Related issue: https://github.com/quickwit-oss/tantivy/issues/2531
One other thing that I'll point out would be to add block-based metadata to fast fields.
One of the ways that file formats like Parquet and Vortex accelerate range scans (as well as point queries, I suppose) is by mixing sorting/z-ordering with per-block metadata.
A similar approach in tantivy's fast fields would be to:
- restore index sorting
- adjust some/all of the fast-fields codecs to store min/max on a block basis, and to then use it to eliminate entire blocks from range queries.